If you’ve been following the development of search solutions in the last couple of years, you must have heard a lot about semantic search. Semantic search refers to a family of information retrieval techniques that are rooted in natural language processing and semantic analysis. Unlike the more familiar keyword-based SEO that targets individual keywords, semantic SEO focuses on query context and relationships between words, elevating search engine optimization beyond mere keyword matching.
Google is an example of a semantic search engine that does a fantastic job of interpreting natural language and returning results that are based on contextual meaning and semantic similarity. For instance, if your query is “digital commerce”, Google search will know that you might also mean “ecommerce” or “e-tailing”, or whatever the latest trend is.
Semantic search is tremendously powerful when it comes to dealing with topics or categories, suggesting that it might be the perfect search solution for ecommerce, where queries tend to be idiosyncratic and highly context dependent. But if you take a closer look at its capabilities you’ll see that semantic search — while being several steps ahead of keyword-based search — still falls short of delivering the most relevant search results to your customers.
To really understand the value of semantic search and whether or not it can solve the search challenge for ecommerce, we need to consider:
- What ecommerce search aims to achieve and why keyword matching is not designed to help it get there
- How semantic search differs from keyword based search and why it looks so promising for ecommerce, and
- The challenges and shortcomings of semantic search when it comes to ecommerce.
After looking at each of these in turn, we will talk about a next generation search solution that builds on semantic search but goes one step further to deliver relevant results.
What Challenges Does Keyword Matching Face in Ecommerce?
To better understand the value of semantic search and the challenges that it seeks to address, first let’s how keyword matching works and where it falls short.
A Short Primer on Keyword Matching
Many popular ecommerce search engines rely on general-purpose keyword search algorithms.
The well-known Apache Lucene library, which powers both Solr and Elasticsearch systems, is a nice example of a leading search engine based on keyword matching. When applied to ecommerce, these search systems treat all product information (such as title, attributes, and description) as parts of one large text document.
When a shopper enters a search term, the content of their query is split into single words. Each term is searched within the document containing the product’s description using boolean logic. All matches are then ranked according to word statistics, which relies on counting how often particular words from the query are mentioned in the product data and how common this word is across all the product-related documents in the catalog.
Ranking formulas can become pretty sophisticated, and can take into account a variety of statistical factors and additional data, such as the newness of the product. Still, at the core they’re about counting words.
The Many Relevance Problems of Keyword Matching
There are a few tricks that keyword matching-based search engines use to provide better result recall. For instance, they can normalize words, reducing them to their stems, so that “clean” will match “cleaner.”
But due to the mechanics of information retrieval approaches, they still may not produce relevant search results.
The main goal of search in ecommerce is to figure out the user’s intent as quickly as possible. Keyword-based search has serious limitations that prevent it from figuring out the intent and providing relevant results.
Relevance Problem #1: Vocabulary Gaps
Shoppers signal purchase intent in several ways, sometimes using a different vocabulary than that of the product catalog. For example, people looking for “cantaloupe,” “rockmelon,” or “sweet melon” have the same search intent. However, a keyword-matching search engine will fail to identify that. As shown below, Safeway’s search returns no results for the shoppers using anything other than cantaloupe!
While search engines based on keyword-matching can use thesauri with numerous synonyms so that “dress” will match “gown,” this is still not enough to handle ecommerce search.
Beyond the challenges of interpreting natural language, many ecommerce catalogs rely on technical and often idiosyncratic vocabulary that generic thesauri available won’t be able to handle. Think: “solid-wood pergola with mounting kit” in home improvement or “skater dresses” in fashion. Because of this, companies relying on keyword matching-based search engines will end up managing synonyms manually, which is not only tedious but also non-scalable, expensive, and error-prone.
For instance, setting synonyms rigidly may conceal the context-dependent nature of words. Should “black” and “dark” be treated as synonyms? Sure, in some contexts: “black night” can mean “dark night.” But does “black dress” always mean “dark dress”? Not if you want to avoid a fashion faux pas.
Relevance Problem #2: Related Products
If your online store doesn’t carry a specific brand, it’s not enough to simply omit that name. Shoppers who are shown net-zero results will bounce to find what they’re looking for. Instead, your site search should strive to interpret the content of the query so as to decipher your visitors’ search intent and offer intelligent product recommendations.
Say you have a customer looking for Mizuno-brand sneakers, but you don’t carry them. Throwing all the other shoes you have in the store will likely result in customer bouncing. Instead, your ecommerce site search should recognize that your customer is looking for athletic shoes and offer similar products — like Nike or Asics sneakers.
Matching a shopper’s intent to what is available in your inventory is not exactly feasible when your site relies on lexical search that analyzes strings of words as opposed to concepts, contextual meaning and the relationship between lexical entities.
As it turns out, ecommerce websites often struggle to handle such scenarios. For instance, in the example below, I am visiting Ulta Beauty’s website looking for some shampoo by Sachajuan following some great reviews on social media and feedback from friends. Unfortunately, not only am I provided with zero results (suggesting that Ulta does not carry this brand among the hundreds of shampoos available) but I’m also shown a bunch of recommendations that are clearly irrelevant — as they have nothing to do with my search intent.
Relevance Problem #3: Ambiguous and Broad Queries
Did you know that there’s more than one kind of ambiguity?
One is structural; the way the content of a search query is ordered can vastly change the intent. For example, consider “dress shirt” and “shirt dress:” one is a shirt for business or fancy occasions, and the other is a dress that is fashioned in the style of a shirt. Same words, completely different meanings — just by changing the order of words!
People easily navigate this complexity, relying on the context of an interaction and their own vast background knowledge about the world. But anyone expecting a keyword matching-based search engine to be this smart will be left disappointed. Keyword matching is poorly equipped to handle the content of such queries. In fact, even a popular website such as ASOS (Alexa 235) returns the same search results for the two queries.
Another type of ambiguity is semantic, which relates directly to the meaning of the query. If I search for “denim,” I might mean a type of fabric or I could be looking for a new pair of jeans. Keyword-based search systems rely on lexical search (i.e., matching strings of text rather than semantic matching), which makes it hard to handle not only syntactic but also semantic ambiguity.
Relevance Problem #4: Precision of Search Results
It’s not enough to just show your customers something. While showing zero results can lead to a customer bouncing, showing unrelated content can also send shoppers elsewhere. Search based on matching keywords struggles to handle the precision of results.
For example, say you search for “men’s black leather wallet.” The product catalog doesn’t have any SKUs that match this query exactly. Unsophisticated search systems will resort to a partial match and may return “men’s brown leather wallets” (which is relevant!) along with “men’s black leather belts” (irrelevant!).
While you can boost the most relevant items to the top, it is good to keep in mind that most ecommerce sites allow customers to re-sort products by price, newness, sales, or rating. In other words, the most potentially relevant product will be lost in a sea of irrelevant content.
The main problem with keyword-based search can be summarized as follows. Shoppers looking for that perfect product don’t just want a search engine to match their keyword; they want the search solution to understand their shopping intent, even if they only enter three words in the search box.
In light of this, digital leaders have been exploring sophisticated ways to leverage a combination of historical and in-session variables to create a semantic web or knowledge graph of words. That is the basis of semantic search.
Is Semantic Search the Answer?
Clearly, ecommerce merchants need an alternative – and superior – search technology to tackle the above-mentioned challenges of keyword-based search. Semantic search engines offer an alternative approach.
While the term semantic search was coined in 2003, it didn’t really gain traction until the deployment of Google’s Hummingbird in 2013. The year prior, Google announced that users of its search engine would be able to search for “things, not strings” (of text), which captures the core idea of semantic search quite effectively.
Broadly speaking, semantic search aims to match the content of the query to documents that correspond to the meaning and user intent – not just its words like the full-text search approach used to do.
Semantic Search Doesn’t Speak with One Voice
There isn’t just one way to do semantic search. There are in fact several approaches available, ranging from knowledge graphs to semantic vector spaces.
Knowledge Graphs
A common approach to semantic search is the knowledge graph. While the term was popularized by Google (the Google Knowledge Graph was launched in 2012) knowledge graphs are way older than that. Simply put, these organize structured data from multiple sources, capture information about entities of interest in a given domain or task, and forge connections between them.
They represent knowledge by subject-predicate-object triples, where entities (i.e., subject and object) are connected to each other by predicates/relations.
The semantic nature of knowledge graphs comes from the fact that the meaning of the data is encoded in an ontology that describes the types of entities in the graph and their characteristics. The graph, then, is not only a place to organize and store structured data, but also (and crucially) to derive information and enable advanced processing of queries. When a customer searches for “purple jacket,” the knowledge graph understands the relationship between the different words and helps return relevant results.
Semantic Vector Search
Another approach that has garnered plenty of attention over the past years is semantic vector search. This has become by now such an established approach that there are open-source projects (such as Facebook FAISS , for example) that are used by digital players and vendors.
The basic idea behind this semantic vector search is that we need to go deeper into the meaning of both data and queries in a way that is directly accessible and understandable to computers.
Computers can only deal with numbers, so we need a way to represent our data numerically. Obviously, a single number cannot represent all the complexity of a query or a product, so we need a whole lot of them. An orderly list of numbers, such as [24, -5.14, 0, -14] is called a vector, and the length of this list is called a vector dimension.
Imagine representing all queries and products as two-dimensional vectors — creating a vector space. With semantic vectors, products and queries that are similar in meaning are represented by vectors similar in distance. For instance, we can have a clear cluster of queries and products representing concepts of dresses.
Distances between points represent levels of similarity between corresponding concepts. We can thus build a semantic vector space for our structured data. With a semantic vector space, the complex and vague problem of searching for relevant products by text queries can be transformed into a well-stated problem of searching for closest vectors in vector space, which is something computers are very good at.
Semantic Search Benefits and Value
By deploying these more sophisticated approaches and moving beyond basic keyword matching, semantic search promises to address some of the critical problems that keyword-based search suffers from.
For example, by leveraging its semantic search capabilities, Google can handle vocabulary gaps. It understands that the semantic query “home renovation loans” means the same thing as “home improvement loans” and that the user intent behind both searches is pretty much the same.
Similarly, if, because I read stellar reviews, I search Netflix for the horror movie The Babadook, I’ll find it isn’t available. This may be bad news – but Netflix still manages to recommend content that is quite relevant, because it understood my intent.
Keyword matching struggles when the content of the queries is similar in terms of words and structure, but actually relate to very different products. Semantic search handles this effectively. For example, Google produces different results for the queries “camera with lens” and “lens for camera” because it understands the meaning of “for” versus “with”.
So, is semantic search the best option for ecommerce search features? Well, not so fast.
A Better Alternative to Semantic Search
In web search, queries tend to be relatively long. But this is not the case in ecommerce. For example, research from the Nielsen Norman Group shows the average number of characters is 20.5 for web-wide searches. Meanwhile, about 30-40% of e-commerce customers start a shopping session with broad queries like “mens tops”, “nike,” or “handbags.”
Short head queries like these provide very limited information about what your customers are really looking to buy. Some of those queries, like “mens tops,” match a significant proportion of the catalog, and nothing in the search query itself can help determine the relevant product.
The linguistic content associated with online customers’ queries typically does not provide enough semantic context to determine a shopper’s preferences and needs (what we call shopper or user intent).
Does a shopper typing in “shoes” want running shoes, and how do you determine that just from a single word? Does a search for “jacket” mean a winter or a summer jacket, which are completely different yet both relevant sets of products?
Because an ecommerce search query is typically short, broad, ambiguous, and underspecified, semantic understanding alone is often not enough to return results that are fully relevant to a shopper’s intent.
And if purely semantic approaches don’t guarantee the capture of shopping intent for a huge and critical portion of customers’ sessions, then what does?
From Word Vectors to Product Vectors
Leading industry analysts have introduced new categories to mark the need to evolve semantic search into a more mature, complete, and intelligent approach to information retrieval. For example, Forrester Research has introduced the category of Cognitive Search to refer to a more complete, sophisticated approach that leverages multiple types of relevant data, artificial intelligence, and deep learning to deliver the most relevant search experiences.
In a recent report, Forrester analyst Scott Compton pointed out that “Cognitive Search is able to show results based on a combination of historical and in-session variables, making it increasingly relevant to the consumer, even during their first visit.”
It is precisely in this spirit that Coveo recently introduced an approach dubbed ‘Personalization As You Go.’
Similar to the concept in semantic search of creating vector maps of words, the idea is to create a vector map of the products in a customer’s catalog. By mapping out products that are more akin to one another using attributes such as brand, size, price point, color. Think of this like a map of a store with similar items being displayed near one another.
The idea is then to combine this vector map with in-session variables and customer onsite behaviors to capture search intent. Apply machine learning in real-time to tailor query suggestions, autocomplete, dynamic facets, or recommendations for the shopper.
For example, there’s nothing in the content of the query “gloves” that specifies that the user is interested in golf gloves rather than, say, winter gloves. But onsite customer behavior and the fact that she’s been browsing through golf pants definitely help capture the shopper’s intended meaning.
This is a new and exciting way we can solve commerce challenges such as cold start shoppers and detecting shopper intent that goes beyond the limitations of what semantic search can offer.
Dig Deeper
To learn more about the ways in which Coveo leverages in-session behaviour to deliver the most relevant results, read Powerful Personalization in Ecommerce
If you want a technical deep dive then continue reading with Real-Time Search Personalization in Less Than 100 Lines of Code!
To learn more about why Coveo has been named a leader in Cognitive Search for the fourth consecutive year, read The Forrester Wave: Cognitive Search Q3 2021.
To learn more about Coveo’s AI innovation research visit research.coveo.com