[Watch] Taking Advantage of Conversational Context to Improve NLP Models

Posted by Daniel Shank on Dec 20, 2017 3:19:12 PM

At Talla, we build products that help people find and make use of an organization’s internal knowledge. While we've already implemented technology for enterprises with models that are successfully in production, we continuously experiment with different data and methods that have less certain outcomes. In my talk at the Machine Learning Conference, which you can watch below, I shared what happened when we experimented with using chat data for issue detection and matching.

In theory, a good source of information for learning how to source people’s problems are the questions that accrue in their internal chat platforms. In practice, the questions people ask are highly context dependent, relying on what people have said in the greater conversation. This makes it difficult to use questions in chat data for machine learning purposes. There are still several ways we can take advantage of chat data and associated questions, particularly by leveraging the text located near the questions people ask.

A simple use case for internal business NLP is doing search, or matching questions and search queries that people make to previously asked queries and questions. This is designed to eliminate the redundancy of the support functions that HR and other internally-focused departments serve.

There are many ways to take text and leverage it to produce a good model for text similarity. We want to be able to determine if any two statements, words, or questions are related on a deeper level, rather than simply determining if the terms match. For instance, the word “printer” should be recognizably more similar to the word for “computer” than the word “dog.” There are several methods to accomplish this, and the deep learning based methods Word2Vec and GloVe have been in vogue for a few years at this point, though they are in many ways comparable to techniques that computational linguists have been using for many years, that focus on representing a ‘term-context’ matrix in different ways. The key to all of these measures of similarity is finding out what words and phrases occur together and then finding out a way to represent those similarities in less space, by compressing them. The reason for that is simply that representing a word as a list of all of the places that it has ever occurred in your source documents is extremely space inefficient. Further, it turns out that this process of compressing the matrix that represents word co-occurrence can help refine your measure of similarity by reducing spurious correlations.

 

Machine Learning Conference, November 2017

Though these techniques are quite common and available for free, there is still value in tailoring your usage of them to your particular use case. For instance, you need to make a decision about what it means for two words to ‘co-occur.’ At one level, you may want to look at what words occur in the same sentence. At another, the important relationships may only be discovered by examining the paragraph level. Taking the case of questions people ask in chat as our text of interest, we want to be able to match words that have similar context. This way, when someone asks a question about printers, we will find words that occur in similar issues that printers occur in, and hopefully find a better solution.

This is important because in most large collections of text, the word "printers" is going to be more similar to various other kinds of computing hardware (on average). Ideally, words like "printer" would be similar to words like "driver" and even "jam," since our goal is to help people find relevant issues that others have experienced in the past.

We approach this situation by widening our context window, so it is not limited to just the words within every individual question, but analyzing the words within groups that are defined by the conversation immediately leading up to every question. People frequently update others about what’s going on with their situation before actually asking a question: “Here’s what happened. I tried this. Now what should I do?” We capture all of the preceding text and use that to make our model of word similarity.

This is really only the tip of the iceberg, a simple example of how you can take advantage of conversational context in order to improve an NLP model. There is a lot of potential in all the chat data that’s collecting on hard drives around the world. With more progress in understanding how people interact (at least well enough to apply modern ML techniques), we can get more value out of the conversational data that we’re collecting all the time, and get closer to a world where you never have to answer the same question twice.

We're hiring! If any of this sounds interesting to you, check out our open positions on the data science team.

See Available Positions

Topics: machine learning, NLP, Data Science