The earliest search engines were primarily keyword driven, gleaning their results by matching a specific query with a webpage or document that included those keywords. This was an inexact science, at best, and could be wildly inaccurate and frustrating for early internet users.
That’s because keyword search is not a straightforward process. At least, not for humans. In a recent webinar, Vincent Bernard, Director of R&D at Coveo, spoke with three NLP experts about this exciting new technology to better understand the hype around this latest search trend.
“Often, we don’t even know the exact word we should search for because we’re not aware of the content of the document that we are going to search in,” explained Hanieh Deilamsalehy, Machine Learning Researcher at Adobe.
“For human beings, the concept of keyword matching is not intuitive. When we search for something in a document, we expect the search engine to understand what we mean, to look for the concept rather than the keyword.”
With natural language processing (NLP), modern search promises a much more intuitive process for humans. NLP-enabled search engines are designed to understand a searcher’s natural language query and the context around it. This enables the search engine to provide more relevant results — culminating in natural language search.
“We’re no longer looking at keyword matching, but at the semantic meaning of the search query and then trying to retrieve the relevant results,” said Deilamsalehy.
NLP and the Semantic Meaning of a Search Query
A subset of artificial intelligence (AI), NLP goes (way) beyond keyword matching by using natural language understanding to unveil the semantic meaning of a search query. It attempts to retrieve the most relevant results based on context and searcher intent.
NLP algorithms can parse the nuances and subtleties of human communication in a way that traditional keyword-based search engines cannot.
“They use a mix of analytical signals,” said Eric Immermann, Practice Director of Search and Content at Perficient. “Who searches for what, who clicks on what, how do people interact? These are content-understanding signals that use natural language processing, knowledge graph, and other technologies to help the search platform or search engine better understand the content that’s being searched for.
Some search engines go even further to bridge the gap of search intent and search results by searching video and images using NLP. You can type something in Google like “fluffy cats” and Google’s results will include relevant websites, videos, and images featuring fluffy cats. It’s using language and NLP to search for a relevant result across different mediums, which is a key difference between today’s modern search and yesterday’s keyword-driven search.
The What and Why of NLP
Unlike traditional search technology, NLP moves away from a straight index approach towards something more conceptual. Using the term “fluffy cat” as an example, here are the three main steps inherent in NLP-driven search, as defined by Kurt Cagle, Managing Editor of Data Science Central.
- It uses query permutations: NLP first tries to figure out what the user means by “cat.” It does this by looking at different permutations of the term. Then, it looks at the word “fluffy” and this term’s associated permutations. The NLP then identifies the commonalities that occur with this word combination. It’s these commonalities that produce the conceptual search results. “In essence, you’re moving through a breadth of information in order to be able to find what the most likely meaning is from that query,” explained Cagle.
- It’s an iterative process: Another focus of NLP — and a key difference versus traditional search — is that we’re moving into search as an iterative process rather than simply a single access point into a system. Said Cagle, “When we talk about search, it’s not just the initial search, it’s that I’ve selected something that tells the system that this is kind of what I’m looking for. That changes the context of what I’m looking for in the future, as part of the overall process.”
- It’s contextual and conversational: An iterative process that considers context and applies what it’s learned from a user’s past searches is more conversational. That’s the essential nature of an NLP search engine and why it’s better suited to the way we, as human beings, search for information.
One key difference between the traditional machine learning (ML) used by legacy search engines and using NLP techniques is that the latter is made possible by deep learning.
With traditional ML, the algorithm performance gets better and better as we introduce more data, but eventually it reaches a plateau. That is, fundamentally, the model stops learning. With deep learning, the model performance keeps getting better and better as we introduce new data to the model.
“Much of our progress in NLP is due to the progress in deep learning and the introduction of transformers and large language models,” explained Deilamsalehy. “Advances in large language models have revolutionized the world of NLP.”
Why Deep Learning Matters for NLP
Deep learning, a subfield of machine learning, is made possible by artificial neural networks, which are algorithms inspired by the brain. Deep learning allows computers to process huge amounts of data in ways that replicate the way human brains process data.
This is, of course, an oversimplification, but it’s important because this is the main way today’s search technology differs from early search. That is:
- Deep learning requires huge amounts of data: A model like GPT-3, for example, is trained on a large corpus of data, the scale of which is enormous. GPT-3 was trained on 45 terabytes of text data and run through 175 billion parameters. That’s practically the whole internet. We didn’t have access to this volume of data until recently, but digitization has made it accessible.
- Deep learning requires significant computational power: To train a model like GPT-3 with its billions of parameters, you need a tremendous amount of computational power. It was impossible to do this in the past. We simply didn’t have the technology available to us. Today, GPUs are becoming more available and more affordable, so deep learning is now possible.
Building vs. Buying NLP Search Capabilities
Immermann notes that there are two major approaches that companies typically use when implementing NLP search — the homegrown approach and the outsourced approach.
Building Your Own NLP Search Engine
Said Immermann, “With the homegrown approach, you take a search engine like Solr or ElasticSearch, basically a wrapper around Apache Lucene, and you build capabilities like natural language processing and natural query understanding models on top of that.”
With the build-it-yourself approach, you’re essentially assembling the LEGO blocks of your search capability, but you need developers that understand how to do this.
“This can be a great way to go if you want to own your search capabilities from the ground up and tweak it for every intricacy of your business,” said Immermann. “However, there’s going to be a significant investment in implementation, development, testing, tuning, and ongoing product level management to keep your search product running well.”
Investing in an NLP Search Platform
The other approach to implementing NLP search is to work with a company like Coveo that has built a holistic platform out of the box and makes that available as a SaaS solution.
“The Coveos of the world have taken all of these technologies and integrated them into a holistic platform,” said Immermann. “They’ve done the tuning. They’ve done the testing, but they’ve also built a nice business-facing UI on top so that it’s not only accessible by developers. I could have my business users go in and look at the analytics, make tweaks to relevancy, do A/B testing, and get feedback.”
With a holistic NLP platform, you get all the capabilities that you’d theoretically build yourself. But the platform has been refined and iterated over years, improving the relevancy and accuracy of those models with a dedicated engineering team.
3 Natural Language Processing Misconceptions
The panel of experts we spoke with for this piece each touched on some key misconceptions (or pitfalls) that are common when companies think about and implement NLP search.
Here are some things to keep in mind as you move forward with NLP for your company:
- NLP is not a silver bullet: Just because you have NLP technology doesn’t mean your relevancy will be 100% accurate all the time. “The biggest misconception that I run into every day is that NLP is magic,” said Immermann. “Everyone’s business domain, knowledge domain, and customer domain is slightly different. So, although there are broader industry segments that we can use to start with, different company acronyms and approaches drive complexity.”
- Consciousness is not intelligence: In some cases, NLP models are so good that we can confuse consciousness with intelligence. “One thing to keep in mind is that these models have seen the entire corpus of the internet,” said Deilamsalehy. “They’ve become really good at learning human language and they can speak it very well. They can understand it. They can retrieve the data and give you the best answer based on a purely raw language model. But this is completely different than a human being telling you that they’re happy today.”
- NLP must always be fed: When you’re dealing with NLP, you’ll always need additional information coming in that provides context changes. Said Cagle, “As part of the NLP process, you need to include feeds that drive external information awareness into the system which can change the nature of the NLP, so it more accurately reflects changes like COVID. You have to be aware of the overall ambient information flow before you can get reasonably comfortable in how this information is going to be seen and processed.”
A language model, Deilamsalehy clarified, has been trained on a large corpus of data and uses lots of computational power. Many engineers and scientists are behind this language model to make it work. That is, NLP search works because of the combined intelligence and hard work of many people, rather than the consciousness of the language model.
And keep in mind that no matter what kind of AI or ML you’re working with, data quality is key to having a good overall output. At the end of the day, NLP is changing the way we interact with technology and search engines in a very real way and that’s because of the hard work of many smart people.
Want to get all the info straight from the horse’s mouth? The full discussion is available for you, on-demand.