We've now had thousands of companies try Talla, and we have learned a lot from our metrics about how people evaluate bots and conversational interfaces. The most interesting thing so far is that when the Talla trial ends, most enterprises don't make a decision to buy or pass - they email us and ask for a trial extension because they don't feel like they have evaluated it enough. As we dug into this issue, our analysis of the data and discussions with customers in the trial process allowed us to develop a few hypotheses about why this happens. The number 1 issue seems to be that, given the open ended nature of a conversational interface, it is difficult to know when you have said enough things to it to know it works well. With that in mind, here is our recommendation for how you evaluate bots and conversational interfaces in general, and then we will write a follow up post for evaluating Talla specifically.
3 Key Steps To Bot Evaluation
1. Test The Help Functionality - The first step to any evaluation of a bot interface is actually to figure out what happens in cases of failure or uncertainty. Conversational interfaces are new, and people often don't understand how to use them. And as the use of bots rises in the organization, users may sometimes get confused about which bot does which thing. When we look at our anonymized natural language data from our customer set, the most common question we get from new users on Talla is "what can you do?"
Start your bot evaluation by making sure that new employees exposed to the bot for the first time have a path to learn. What happens if you type "help" or "what can you do?" What does the bot say when it is first introduced to a new employee? How does that introduction work? If you haven't used the bot in 3 months, how easy is it to pick back up and be reminded of the proper syntax?
2. Test Paraphrasing and Input Language Structure - A good conversational interface has to do one of two things. It either needs to handle paraphrasings very well, or it needs to provide feedback on the sentence structure you use to communicate with it. If the bot just does a few things, check to see if it provides feedback on how to say something. For example, if a bot can tell you the weather, and you type "san francisco weather", and the bot doesn't recognize it, does it throw an error, or tell you something like "It looks like you are talking about the weather, try saying 'what's the weather in <zipcode>'" Figure out if the bot can coach users on the best way to communicate.
Alternatively, some bots have invested a lot in NLP methods to detect paraphrases of commands and questions. Test this out by asking the bot a specific question in two or three different ways. In Talla, for example, if you teach Talla how to respond to "Who is our health insurance provider," make sure you try "Who provides our health insurance?" and "What company do we use for health insurance?" to see how the bot responds.
3. Test Your Top Use Cases - There is this strange tendency amongst users to try to trick bots and break them. If you build a scheduling bot, users don't say "Schedule a meeting for Monday at 3pm with Fred." Instead, they say "schedule a meeting one week from yesterday in the afternoon, with our CFO, at a coffee shop we've never been to" or something similarly crazy. While this might be fun, it doesn't really help you evaluate the bot in day to day interactions. After you've tested paraphrasing and help situations, focus on your core use cases. For the things the bot will be doing 90% of the time, is it highly accurate? When it makes a mistake, what is the path to learning?
The companies we have seen who have the most success with their Talla evaluation tend to follow these three steps. And to that, I'll add one bonus tip that we have seen in our best customers - phase your rollout in 4 stages. Here they are:
- On the day you install the bot, try steps 1-3 above yourself.
- Then invite 3 other people at your org to test the bot for a few days, using the same process.
- If that goes well, roll it out to a full team to use for 3-4 weeks. This allows you to have a controlled process where you can understand any questions or issues that may arise from a broader roll out.
- Once you feel comfortable with the use of a full team, roll the bot our organization wide.
Evaluting brand new product spaces can be challenging, and confusing. But we know enterprise bots are the future of work, and the sooner you can make them part of your team and incorporate them into your workflows, the sooner you will gain an advantage over the laggards who wait until it's too late. And of course, if you want to use a bot to accelerate and amplify the work of your internal service teams like I.T. and H.R., we hope you will give Talla a try.