Natural language processing (NLP) is an exciting subfield of artificial intelligence. On Episode 40 of the podcast AI at Work, Steve Cohen, COO and co-founder of Basis Technology joined host Rob May, CEO and co-founder of Talla, to take a deeper dive into exploring recent trends and future possibilities for NLP.
What is NLP exactly? Steve defined it as one of the capabilities contained under the broad umbrella of AI that focuses on bridging the gap between human language and computer processing.
Over the course of the last 20 years, NLP has changed dramatically. Before the advent of AI tools, machine learning, and statistical modeling, the approach was classic, rules-based NLP. This involved hiring linguists and having them construct dictionaries of data and rules describing the relationships between words, building up complicated rules-based architectures.
That approach is almost completely gone today, said Steve, replaced by a different kind of rules-based NLP. Now, machines figure out what the rules should be. Essentially, machine learning has replaced the linguists who previously manually built the rules-based architectures underlying NLP.
Helpful NLP Applications
The first application that comes to mind for most is probably machine translation - typing in a German sentence and getting a French sentence out. Steve shared several other useful applications of NLP, including being able to identify the language of a text (highly useful for downstream processing), and extraction - specifically named entity extraction.
Named entity extraction, he explained, is something companies are interested in for finding names from text - typically people, organizations, locations, companies, etc. In addition, he pointed out, extraction or identification can also pertain to sentiment around products, events, relationships, and so on. Sentiment analysis can be particularly useful in terms of understanding customers’ interest or lack of interest in a particular product or a particular brand.
NLP has been a critical part of national security efforts for years. In 2006, there was a terrorist plot to destroy 7 planes as they crossed the Atlantic using liquid bombs smuggled aboard. By using natural language processing to analyze intercepted information, US and UK authorities were able to discover the shape of the plot, find out who was involved, and ultimately prevent the terrible event. It’s thanks to NLP that the people aboard those seven planes were saved and also why we can’t carry more than 3 ounces of anything aboard a plane. It was Steve’s firm, in fact, that built the threat detection application used to uncover this plot.
The Future of NLP
Is the field of NLP converging towards one technique that can work for many different things? Not so much, said Steve. Different NLP applications require different approaches - ranging from MEMMS to neural nets - but there’s no silver bullet that can solve all problems. It depends on the specific task at hand.
There’s one persistent upstream problem that Steve said they’ve encountered consistently over almost the entire 20 years, and a large part of it is dealing with scanned documents or PDFs. The structure of a PDF document, said Steve, “was designed to make it easy to print. It was not designed to make it easy to understand what’s going on inside and reverse engineer the structure, and scan document segmentation, and so on.”
The final segment of the podcast considered a philosophical dimension - the ethics of AI, bias in models, killer robots. Steve shared that as with almost any new technology, all of these concerns are both valid and invalid at the same time. Killer robots might not be coming anytime soon (if ever), but the effects of continued automation need to be considered deeply.
There’s much to come, both for the field of NLP and AI more broadly.
Tune in to AI at Work on iTunes, Spotify, Google Play Music, Stitcher, or SoundCloud and share with your network! If you have feedback or questions, we'd love to hear from you at firstname.lastname@example.org or tweet at us @tallainc.