Chatbots are all the rage. Or, at least they were a couple of months ago. You don’t need to go back far in time to see the full hype and disillusionment cycle. In April, VentureBeat declared a chatbot gold rush and then a short time later in October that chatbots suck (for now). How did this change in chatbot sentiment build so quickly?
Many companies, including ours, started with a chat-first strategy but have evolved their understanding of the benefits and limitations of conversational chat-only interfaces. In this post, I’ll detail some of what we learned about building conversational interfaces, including the most important insight we arrived at when building out the product.
Building a Conversational Interface
Before we get to some of the problems with conversational interfaces, let’s take a quick look at how our conversationally-driven system was built. Users interact with Talla conversationally in a workplace chat environment, primarily Slack, to perform a wide variety of discrete “tasks” like scheduling meetings, managing job candidates, tracking task lists, etc.
When a message comes in from a user, it is processed through an NLP pipeline to transform the text from a statement in natural language into a machine-readable description of the task to be performed with all required parameters resolved. Consider a user trying to schedule a meeting with a colleague: “Schedule a meeting with Byron for 4:30 tomorrow.”
First, we classify the user intent of the message. We use a multinomial logistic regression model (known commonly in the NLP context as a “maximum entropy classifier”) to classify a user’s text from a list of pre-determined Talla intents. In short, the classifier uses features from the statement such as the presence of specific words or sets of words (n-grams), along with user context information (like previous interactions) to determine the probability that the user is trying to accomplish a certain task.
After a message is classified, in this case as a “schedule meeting” intent, we attempt to extract parameters required to perform the task. In the example above, we’d like to recognize that the time for the meeting should be “4:30 tomorrow,” and that we should invite “Byron.” Further, we should be able to resolve who “Byron” is within the organization and provide it as a user ID to our scheduling backend. In this example, we’re extracting a name and a time but, more generally, we can extract all sorts of entities including places, people, times, numbers, email addresses, etc. These extractions are done using a variety of techniques ranging from sophisticated machine learning models, to duct tape and regular expressions .
Once the user intent is fully resolved and the parameters are extracted, we construct a JSON formatted payload describing the task to be performed (along with a set of metadata about the request, the user, etc), and we pass it along to a separate backend system for task processing.
As an aside, we’ve also evaluated the use of “deeper” and “trendier” machine learning methods like convnets and LSTMs for building conversational bots, but we found that they are not a good fit for our needs. More on that in another post.
“People don’t read what Talla is saying”
The “classify & extract” approach described above works quite well when users provide all the details up front (and when the extractors perform well enough to find those details). But what happens when the user doesn’t provide some piece of data that is required to complete the action? As a general strategy, we try to minimize these scenarios and make intelligent assumptions about what the user is trying to do based on context and user history, but sometimes it’s just not possible to infer what a user is trying to do from an initial statement.
So how do you resolve a missing piece of data in a conversation interface? You simply ask, right? We naively assumed that if Talla asked a direct question, people would provide an answer in their next interaction. Here are some of sorts of things we observed.
User: Schedule a meeting for tomorrow
Talla: Who should I invite to the meeting?
User: Call the meeting “discuss 3rd quarter development goals”
Talla: I didn’t understand that.
User: Add a task due next Friday
Talla: What is the name of the task?
User: Oh actually, it should be due Saturday
Talla: I’ve added the task “Oh actually, it should be due Saturday”
User: Put a meeting on my calendar
Talla: Who should I invite to the meeting?
User: Show me the weather in Boston
Talla: I didn’t understand that.
As we watched these kinds of interactions, we made changes to the copy and formatting to try to make it more clear what information Talla needed to complete the task. But still, we continued to find poor user interactions which eventually led us to the realization: people don’t read what Talla says.
These types of problems are the non-conversational UX equivalent of observing a user click random buttons in a UI, or type a password into a username field. In both conversational and traditional UIs, it can be tempting to blame the user for such transgressions. After all, can’t they just read what the bot is asking and answer the question? But we quickly realized these were shortcomings of our conversational system. After all, what is the benefit of a conversational and natural language interface if you cannot deal with these very common natural language interactions. We had failed to appreciate the extent to which everyday communication with other people is filled with these language patterns and how transparently we deal with them.
Evolving (Chat)Bot UX
So our assumptions about how people interact conversationally with bots led to some bad user experiences. How do we resolve them? Well, to address this specific problem, we’ve adapted by taking a more probabilistic interpretation of conversational state. If we’re expecting user input for one value, but the user says something that is strongly classified as a separate task, we can switch to that task. Or if we’re expecting a date, and the user provides something that can only be extracted as a person, we can infer that they were actually interested in updating the meeting’s invitees rather than the time.
But we’ve also evolved our thinking around bot UX more generally, and in fact, our thinking around the role that bots should play in an organization.
On the topic of chat UX, we’ve seen a shift towards more hybrid interfaces in chat across many platforms, including Google Allo, Apple’s iMessage, and in Slack through the use of buttons and attachments. We’ve started integrating these elements more into our own product to build a better user experience. Furthermore, we realize that some interactions, such as those involving a large set of inputs and settings, are better left to traditional UIs with forms and buttons. So when the benefits of traditional UX outweigh those of conversational UX, we guide users to good ol’ fashioned webpages.
Beyond the conversational UX, however, we’ve also learned a lot about how users want to use and interact with bots. And what we found is that more than using bots for simple tasks like scheduling individual meetings, people want to use bots to distribute information and to orchestrate and automate more complicated business workflows. Moreover, we learned that these workflows vary greatly from organization to organization, and most of them are not “conversational interactions” at all.
So in building out Talla’s powerful knowledge delivery functionality, we’ve deemphasized natural language and conversational UX where possible, and are building powerful content authoring tools for the web. While we’re excited about the future of conversational interfaces and delivering knowledge to team members in chat, where they spend much of their day, we’ve learned a lot about when to use conversational UX–and when not to.
 Anybody claiming their NLP system doesn’t use some hidden regular expressions somewhere is suspect.