Search is a very complex problem hiding behind a very simple interface.
From the earliest days of the internet, people have been trying to find web pages that meet their particular intent. To do so, they go to a search box and type in a keyword or phrase such as “cat videos.” Out pops a list of pages showing all kinds of cute cat videos.
This process is so intuitive that the search paradigm is never really questioned. You just type (or speak) a search term to a query engine. Then machine learning algorithms provide the (hopefully) correct search result back in the same format.
However, there are problems with this paradigm. Suppose that what you’re looking for doesn’t fit into a neat category (i.e., is unstructured instead of structured data). Or it involves very specific, contextual information. Google calls these complex questions.
An example: Ask Google search about what toys you should release for Christmas to make the best profit. You’re likely to retrieve ambiguous information at best. Ask which departments had the largest revenues per customer in your company. You’re likely to get even less information than that, possibly because Google doesn’t have your sales data (and preferably shouldn’t). But more likely because Google simply didn’t understand your question.
This is where natural language search comes in.
What Is Natural Language Search — With an Example
Natural language search, which uses a machine learning technique called natural language processing, lets users conduct a search using human language. Instead of keywords, it enables search powered by human language. Users can verbalize their ‘search query’ which then gets translated into something understandable by the computer. Due to its conversational nature, it’s often referred to as conversational search.
Here’s an example of a traditional search query:
Catskill Mountains height
And then reformatted as a natural language query:
How high are the Catskill Mountains?
Evolution of Natural Language Search
Traditionally, search engines conducted a search by matching the keywords queried against its index. This is also known as keyword-based search.
Narrowing these results put the onus of a search structure onto the searcher. This results in users inputting long or complex queries. Sometimes they’d have to use boolean operators to communicate better with the query engine.
Because their queries had to be formulated in a certain way, this also often removed the intent that underlies everyday language. Marketers would then have to undertake a sort of manual sentiment analysis. They would painstakingly mine search query logs to try and better understand what their searchers were looking for.
Search has come a long way since. Whether typed into a search box or spoken aloud to a voice search conduit like Siri, Cortana, or Alexa, users can pose a question to a search engine much in the same way they’d ask a friend.
A Brief History of Natural Language Search
The first application of NLP technology for search was the START Natural language Question Answering Machine, created in 1993 by the MIT Artificial Intelligence Lab. It’s not a web search engine, but it does let users query an online encyclopedia of information using everyday language.
Rebranded to simply Ask.com in 2005, Ask Jeeves was launched in 1996. It was the first web search engine that allowed users to search the internet using natural language search.
Google came along two years later, and the rest is history. In 2019, Google BERT allowed searches to understand the full context of a word by looking at the words that came before and after it, instead of one-by-one in order.
Application of Natural Language Search
Natural language search is a shift in thinking about information retrieval that goes beyond keyword matching. While keywords (or key concepts) are still important, with natural language understanding the intent is to gain more contextual information by analyzing the questions (queries) being asked. And then using these to help understand the context of those queries.
For instance, consider the question of what toys will be the best sellers this holiday season. This seemingly simple question is quite complicated from a search standpoint because it involves a number of implicit assumptions:
- The query engine can “parse” or break down the question into component sections, and build queries based upon those sections.
- You have, within your data system, entries for toys that you specifically sell, that contain data about how they are selling now, and information about how they sold in the past.
- The query engine can distinguish between your products and your competitors’ products.
- The query engine understands the idea of Christmas as a sales period.
- The query engine can analyze information not only about what’s trendy within your current inventory, but information about what is currently “hot” in the marketplace.
- Upon information retrieval, the system will need to know what you mean by “Show me” (Display the search result in a summary? Tables? Charts? Graphs? Read aloud? Put into a spreadsheet?)
Some of these operations are fairly easy to ascertain and can happen almost instantaneously. Others tend to be ongoing, need a lot of analysis, and can only be queried when put into some kind of indexed structure.
Moreover, the speed of such queries can vary. The first time your best-selling Christmas toys question is asked, it may take some time to assemble this information. Once collated, subsequent retrievals can occur much faster.
The trick is in identifying, given two questions, whether they are similar enough that some or all of the answer can be retrieved quickly from the index without having to do the expensive computations.
Such a search also needs to be sensitive to the person asking the question. Some information may be available to the CEO that might not be available to a visitor to the company website. Thus, the context for such queries includes determining who should be told what, what is currently embargoed content, and which information cannot be passed on due to privacy regulations.
Natural language search also overlaps with speech to text, the process of converting oral to written speech (and back). This happens via a stochastic (probabilistic) pattern that analyses phonemes (distinctive units of sound) and matches them with specific words or phrases.
Probability comes into play here because specific phoneme combinations usually with distinct phrases that occur in a certain order, and by working with a large enough data sample through machine learning, the number of potential words or phrases often can be reduced from thousands to dozens or less, a process that Coveo takes advantage of.
Benefits of Coveo’s Natural Language Search
When it comes to applying NLP technology to search, Coveo relies on modern tech at the bleeding edge of natural language processing research. Our in-house NLP team is focused on identifying and productizing the best approach for a given use case.
Built on the ubiquitous Word2Vec and BERT neural architectures, we don’t reinvent the wheel. Instead, our extensively trained models are augmented with task-specific heads to apply a specific solution for a given use case.
Some of our in-platform applications include:
Coveo is a powerful AI-powered platform that unifies and delivers relevant, personalized information for individual searchers — and then scales that up to deliver the same quality to audiences of massive size. Over time, all those interactions improve the quality of its service, evolving just as your business does.
It’s like a savvy librarian; not only does she know what books you ask for, but everything that goes far beyond your initial query.
Dig Deeper
Looking to dive deeper into how Coveo can augment and enhance your customer experience? We’ve got you covered.