Data is AI Fuel, Which Means You Must Refine It Before You Can Use It

Posted by Alyssa Verzino on Jan 2, 2019 2:30:00 PM

Screen Shot 2019-01-02 at 1.27.04 PM

Ever since artificial intelligence has gone mainstream, analysts and pundits have declared that "data is the new oil," as in a resource that will drive the AI economy and enrich whoever can extract it. Unfortunately, most analysts neglect the second half of the analogy: like oil, most data is crude data that is only useful once it's been refined.

To refine crude data for use with AI, you have to annotate it. And, unlike oil, the data refinement process isn't mature, so the value of unrefined data has been largely overhyped.

Take Google, for example. No one outside the company knows exactly how much data they have, but reasonable estimates put it in excess of 10 exabytes (over 10 billion gigabytes). A sizeable portion of that data is comprised of images, which Google can use to train computer vision algorithms. Google is to image data what Saudi Arabia is to crude oil.

Unfortunately, like crude oil, Google's image data isn't entirely useful in its "raw" state, as Google often doesn't know what is pictured in the images that its web-crawlers have indexed. Many pictures on the internet have obscure filenames and the web pages that host them give little context to what those images show. So, before Google can use its trillions of images to train an AI vision system, someone has to label all those pictures.

Put simply, Google has to refine its image data into a useful state for AI. And that refinement requires annotation.

dog 2

The image-labelling process is often labor-intensive and, to be of highest value, must go beyond, for example, simply identifying a picture of a dog.

It should note that picture as an adult German Shepherd standing in short grass, centered in the image. Without that context, AI can't learn where to look in the image to find a dog, let alone that this is a specific type of dog that may look very different than other breeds.

Google has been forced to find clever ways to label images at scale, which is why it asks random web surfers to annotate images as part of its reCAPTCHA security system. Have you ever been asked to point out the sections of a picture that contain a street sign? You're probably training a computer vision AI to help Google build self-driving cars.

Your business likely has gigabytes of information that could be used to train an AI solution -- customer correspondence, CRM records, sales win/loss analysis, product research, operational metrics -- but it's unlikely much of that data is useful in its crude state. It will require annotation to, for example, identify the customer and the salesperson in an email conversation, as well as a means to connect that conversation to a marketing campaign, a closed or lost deal, and a revenue impact analysis. Without that information, the conversation is just noise.

However -- like Google -- you can adopt an AI solution that includes annotation tools that help you automatically refine all the data you create from this point forward.

Talla offers an AI customer support solution that includes exactly this kind of annotation tool: our smart knowledge base.

Think of Talla's smart knowledge base as a "data refinery" that processes your raw, crude information into fuel for an AI solution. You can load historical data into it and process it into a refined, AI-ready state, and you can use it to create new data that is immediately ready for AI consumption. It's a great place to start building your own internal annotation process and extract value from the reserve of historical data your business is sitting on.

If you'd like to begin the process of refining your business data into the fuel of the AI economy, contact Talla today.

New call-to-action