Machine comprehension is one of the fundamental problem spaces in artificial intelligence research. Can we teach a machine to read, comprehend, and answer our questions? Recent advances in deep learning and natural language processing (NLP) have led to promising breakthroughs and novel AI models.
At Talla, we have been actively investigating the capabilities of machine comprehension. We have deployed several beta features that leverage these models in our product around content training, user question-answering, and unsupervised knowledge ingestion. In this series of blog posts, we aim to share our experiences and understanding of this space. In this post, we will provide a more in-depth overview of machine comprehension research.Overview of Machine Comprehension
A fundamental component in the development of AI is teaching the machine to understand and answer natural language questions. Question answering is a large research space that encompasses multiple types of question answering, such as general knowledge, diagram understanding, and passage comprehension.
Most of the machine comprehension research today involves asking a question and extracting the relevant sentences and fragments from a context paragraph in order to develop an answer. Underlying this approach are deep-learning models that utilize neural networks and various NLP techniques. It is worth noting that the deep learning approach to answering these types of questions is different than most approaches used to answer general knowledge style questions. For example, most common approaches for general question answering rely on an external knowledge source (e.g., a list of facts or a knowledge graph like DBpedia or ConceptNet). When trying to answer a general knowledge question, the machine uses an inference engine that reasons over the external knowledge to provide an answer.
There are both merits and shortcomings for this approach. For enterprise use, extensive subject matter capital and resources are required to craft domain-specific knowledge graphs. It is expensive and time-consuming to craft domain-specific knowledge representatives at scale. Over time, as information changes, updating the knowledge graph becomes expensive as well.
In contrast, the source of ground truth for the machine comprehension model is the context document itself. What this means is that the model has no real semantic understanding of the content of the text. Instead, the model is trained to learn how to answer questions. It is trained using a large training corpus of question/answer pairs and then learns in an unsupervised manner where to seek the answer in the passage text. Most models utilize two key NLP innovations: word embeddings and the attention mechanism. Word embeddings provide the model with a way to “semantically understand” the text and significantly boost the overall performance the model. The attention mechanism, loosely based off the human visual attention, helps the model more efficiently zoom in on segments of the document that are contextually relevant. Finally, most machine comprehension models utilize deep learning architectures, usually a combination of LSTMs and RNN layers.
As we mentioned earlier, machine comprehension models are trained to learn how to answer questions. There are several common baseline datasets set used by research community, the most prolific being the Stanford Question Answering Dataset (SQuAD). SQuAD consists of about 100,000 hand crafted question/answer pairs about various Wikipedia articles (covering topics ranging from Beyoncé to the University of Notre Dame). In January of 2018, there was great media hype around Microsoft AI models outperforming humans in question answering on the SQuAD v1.1 dataset. However, SQuAD v1.1 is a problematic benchmark. It assumes that all questions are answerable, and the answer exists in the context paragraph. Machine comprehension models on the SQuAD v1.1 dataset have a hard time knowing when not to answer a question. That is, if provided a question and completely random contextual document, the model will extract a nonsensical answer from the context document. As a result, the SQuAD v2.0 dataset was recently released (June 2018). SQuAD v2.0 introduces more difficult questions and unanswerable questions in the training data. Novel machine comprehension models are already emerging that promise even better question answering capabilities. However most of these models exist as concepts in research papers or research code. There are not any well-developed software tools available for enterprise consumption.
What kind of questions can Machine Comprehension answer?
In the table below, we use baseline BiDAF model trained on the SQuAD v1.1 dataset to illustrate the capabilities and limitations of the machine comprehension models in general. There are always trade-offs and machine comprehension models provide value in certain contexts. The main implication is that machine comprehension model is adept at answering extractive and factoid style questions - mainly who, what, where, and when questions. Machine comprehension will fail to answer inference style questions, as it does not truly have a universal way to understand the domain and content of the text. So yes/no questions or deeper questions where the answer is not explicitly stated in the context document are not supported by this model.
Example questions where machine comprehension succeeds and fails:
Machine comprehension is an exciting area of research and offers great promise towards the development of general AI. In this post, we described more in detail what machine comprehension research entails. Specifically, it is focused on teaching machines to answer natural language questions in context. Most models leverage deep learning architectures (LSTMs and RNNs) and novel NLP techniques like word embeddings and attention. The space is still nascent, but the research offers exciting and promising value in the real world. In the next blog post, we will explore the value of machine comprehension for the enterprise.