The artificial intelligence (AI) bandwagon for search and recommendations has become crowded, with virtually every vendor touting its AI fortification. Not surprisingly the question that arises is, does the AI work?
When moving beyond rules-based automation to model-based — or machine learning (ML) and even deep learning — interpretability becomes key. After all, does the model do what it’s supposed to do – and how do we know that it does? And how will the model impact in terms of product discovery? You don’t want to be in a position where “what works” is draining profits, or recommending products that conflict with contracted sponsorships.
And while it may be challenging to fully articulate the precise behavior of ML technology, it is possible to strike a balance between positive performance and AI interpretability. In other words, we can tie the positive performance to the model itself.
In what follows, we unravel the intricate interplay between interpretability and ML, ensuring that you can understand the goal of the model, be able to control (configure) it, and then measure its performance. This convergence is what is desired to enhance the product discovery journey.
What Is Interpretable AI?
Here’s an example of spurious correlations in information retrieval. A machine learning model is trained to predict the relevance of documents to specific queries. The training data consists of queries, documents, and relevance scores, with the latter indicating how well a given document satisfies a given query.
Now, suppose that during the data collection process, documents that were generated later in time were more likely to be marked as relevant. This might be due to evolving standards for determining relevance or simply because the later documents contain more up-to-date information.
If the model has access to the timestamps of these documents, it might learn that a straightforward way to predict relevance is to look at when the document was created.
However, relying primarily on the timestamp doesn’t genuinely reflect understanding or accurately identifying relevance based on the content of the document and query.
When this black box model is applied in real-world scenarios, model performance might be questionable. The linear model is not grasping the concept of relevance as intended; it’s just capitalizing on a pattern (newer documents being marked as relevant more often) present in the training data.
Avoiding Clever Hans Effects
To see what spurious correlations might look like, we need to dust off the history book and look at the Clever Hans effect.
In the late 19th century, there was a horse in Germany named Hans whose owner thought he could perform math, including complex operations like interest calculation. The horse would tap his hooves on the ground to indicate an answer, and he would get it right.
Unfortunately, the horse wasn’t actually performing math. Instead he was taking cues from his owner’s subconscious facial expression. Turns out the owner’s face would change when the horse neared the correct answer.
Much like Hans, machine learning algorithms can memorize patterns and learn spurious correlations.
Here’s an example of spurious correlations in information retrieval. A machine learning model is trained to predict the relevance of documents to specific queries. The training data consists of queries, documents, and relevance scores, with the latter indicating how well a given document satisfies a given query.
Now, suppose that during the data collection process, documents that were generated later in time were more likely to be marked as relevant. This might be due to evolving standards for determining relevance or simply because the later documents contain more up-to-date information.
If the model has access to the timestamps of these documents, it might learn that a straightforward way to predict relevance is to look at when the document was created.
However, relying primarily on the timestamp doesn’t genuinely reflect understanding or accurately identifying relevance based on the content of the document and query.
When this black box model is applied in real-world scenarios, model performance might be questionable. The machine learning model is not grasping the concept of relevance as intended; it’s just capitalizing on a pattern (newer documents being marked as relevant more often) present in the training data.
Needless to say, we don’t want our ML models to perform like Hans. Instead, we want a correct explanation.
And yet, a recent article published in Nature Communications illustrates quite compellingly that Clever Hans situations are quite pervasive and commonly encountered in deep learning models.
Why Does Explainable AI Matter?
So far, it all might sound pretty abstract. Yet, the problems of unexplainable, uninterpretable AI can be tethered to serious consequences. Interpretability plays a vital role in rooting out bias or assumptions in algorithms, for example.
Biases can have real-world ramifications. In the financial sector, for example, bias can impact who is approved for credit or loans. Without interpretability — or knowing what these assumptions are — it’s more difficult to trust models. We won’t know what our model will predict in extreme cases, and we’ll be poorly equipped to identify specific shortcomings or biases.
Instances of ML-driven prediction gone wrong are not too hard to find. One recent example was the identification of facial and gender bias in the Amazon Recognition Commercial System. Such biases are far from isolated.
Research has shown that image datasets frequently used for image recognition tasks can contain biases, and that models trained on such data can amplify those biases, predicting for instance that 84% of people cooking were women — although it was in fact 67%.
A 2021 study, this time from the financial sector, found that borrowers from minority groups were charged higher interest rates and were more frequently rejected for loans than white applicants — even when accounting for factors like income.
On the other hand, here are industry examples where explainable AI can be found:
- Healthcare. Explainable AI systems that aid in patient diagnosis can help build trust between doctor and system, as the doctor can understand where and how the AI system reaches a diagnosis.
- Financial. Explainable AI is used to approve or deny financial claims such as loans or mortgage applications, as well as to detect financial fraud.
- Military. Military AI-based systems can predict weather, better routing, and autonomous vehicles.
- Autonomous vehicles. Explainable AI is used in autonomous vehicles to explain driving-based decisions, especially those that revolve around safety.
It’s easy to see that understanding why AI-based decisions are being made will go a long way in creating trust.
In ecommerce, the consequences may not be as dire. Still no one wants to potentially lose their job because they backed the wrong AI horse. So can model interpretation and understanding play a critical role in ecommerce ML, particularly in the context of product discovery?
Interpretable AI in Product Discovery
Yes, interpretability remains relevant. There are instances of ML-driven personalization that exhibit biases in ecommerce. And in the context of product discovery, adopting ML-driven approaches rooted in deep learning and vector search may pose certain challenges.
For instance, consider a marketplace selling apparel where shoppers query for ‘Adidas.’ A vector engine might return similar results from Nike, Puma, Mizuno, and others because they exist in the same conceptual space. This could potentially conflict with commercial agreements with some of these brands. Hence, being able to understand and control the ranking of products is important.
But the example of vector search is especially interesting as it highlights the tension between interpretability and performance. So let’s explore the topic in more detail.
AI Accuracy at the Cost of Interpretability?
Being able to understand ML models sounds great. But a naturally arising question is whether there are any inherent trade-offs between the “interpretability” of an algorithm and its potential power (be it the scope of situations it can handle, the accuracy of its output or any other measure of performance).
This question is especially pertinent in light of the success of “black box” deep learning, one of the driving forces behind the adoption of ML. The problem seems to be about having a tradeoff between interpretability and performance. Models such as a linear regression, logistic regression or decision trees, are intrinsically explainable, but less predictive than a more-complex deep-learning model.
Simply put, the most easily explainable approaches are not necessarily the best performing ones. This is the case also in the context of information retrieval. For example, keyword search paradigms are highly explainable, as users understand the concept of keywords and how they relate to search results.
On the other hand, vector search is somewhat less explainable. This is because while you can explain that vector distances represent similarity, explaining the exact reasons for a specific result can be challenging.
Importantly, it is possible to have interpretable AI in product discovery and achieve the best of both worlds by adopting a plausible, useful conceptualization of AI interpretability.
This requires adhering to three crucial tenets that define interpretable product discovery: understanding, configuration (control), and measurement.
Tenant 1: Understanding the Purpose of the Model
A fundamental requirement is to comprehend the precise objectives of the ML model. Is it driven by the collective wisdom of the crowd, seeking optimization for profitability, or perhaps pursuing a different goal altogether? Understanding the AI’s intent is the cornerstone of effective interpretability.
For example, consider one of the models that Coveo makes available to customers as part of the Personalization-as-you-go suite. Intent aware product ranking is based on proprietary deep learning technology and the use of product vectors (or “product embeddings”). These multi-dimensional vectors are used to measure the distance between products so that affinities can be determined.
Imagine a three-dimensional space — or store — where products reside. Products that are closer together in attributions — will be closer together. These attributes might be category, color, size, brand. Digital shopping sessions can be thought of as a virtual “walk” in that space — with the added benefit of being automatically tailored for individual customers versus a generic setup for all.
Of course, product embeddings are not inherently fully explainable on their own. We go into much greater detail in this piece on product embeddings and their relationships to recommendations.
But when our customers activate the Intent-aware Product Ranking ML model based on product embeddings, they have a clear understanding of what the model should be doing. Namely, to rerank products dynamically to show the most relevant based on the intent detected during the session.
Other vendors will refer to their models as optimizing for attractiveness. As usual attractiveness can be in the eye of the beholder — but often it means by popularity. But these ill-defined notions often leave business users and merchants confused and unable to really understand what the model is doing.
Tenant 2: Model Configuration
Further, a robust framework for model configuration and relevance settings is paramount. This entails the ability to curate the data and signals leveraged by the AI, granting users the power to shape the outcomes to align with specific preferences and objectives.
For instance, Coveo’s AI proprietary ML models are easily configurable. Consider the Automatic Relevance Tuning (ART) model, which is one of the most important Coveo ML models you will want to deploy. ART boosts products that are the most popular based on the current search query.
In the example shown above, the three first product listings are the most popular (via clicks and other user interactions) when querying for pants. The user context is being considered here to display womenswear. By automatically reordering, ART ensures that the top performing products are always presented first. Members with the required privileges can not only create, manage, and activate models in just a few clicks.
But they can also easily configure whether they want the ART model to automatically consider purchase and cart event data. Further, in the Learning interval section, they can change the default and recommended Data period and Building frequency.
Tenant 3: Model Measurement (Impact)
Further, interpretable product discovery should empower you to look inside algorithms, giving you confidence to make strategic decisions that elevate your KPIs. Extracting useful information from analytics platforms can be cumbersome and time consuming, so vendors should offer actionable insights specific to product discovery.
At Coveo, for example, we’re committed to helping merchandisers access the best information to help inform your decision making.
More precisely, understanding why a product is ranked in a specific position is critical to gaining confidence and trust in machine learning. Coveo’s ranking details let business users understand the impact of AI on the experiences shoppers receive with rankings controls.
Conclusion
It is possible to strike a balance between performance and explainable AI in product discovery. By harnessing the power of performant machine learning models that drive profitable growth and complementing them with robust tools for activation, configuration, understanding, and scrutiny of algorithms and models, you can indeed ‘have your cake and eat it too.’
By partnering with a leading vendor in AI-powered product discovery, you can deliver exceptional customer experiences but also do so with the transparency and control necessary to build trust and navigate the complex landscape of AI-powered product discovery effectively.