Welcome to our Building Ecommerce series. This is taken from a trilogy of videos will walk developers through building ecommerce search solutions in Coveo.

Today we’ll look at the requirements, architecture, and getting your products into the search index.

The generic ecommerce store demo is an example application that shows how to build an ecommerce search solution. The generic store is a standalone CMS built to showcase search, browsing, and recommendations. It uses enterprise index technology in combination with a Headless/React frontend. 

Disclaimer: Although the code is publicly available for testing and development purposes, it is not officially supported by Coveo.

Requirements

  • Ecommerce product set
  • Enterprise indexing technology

Introduction

Ecommerce implementations are in most cases very custom. Because ecommerce stores must be completely adapted to the business case, developers want total control. 

Searching within ecommerce stores is not easy because most have complicated catalogs. 

An image shows product catalog types and components
In the above images, product catalogs range from very simple to very complex depending on the
details you want to make available to your shoppers.

For example: products with groupings (a dress with different colors), products with variants (shoes with different sizes and widths), products with variants and availability (shoes with different sizes and widths, located in different stores). 

B2B implementations are adding even more complexity to the catalog due to entitlements that can affect which products are shown to which users and groups. This also impacts pricing. 

This three-part series will explain how to design, index, and build an ecommerce search solution.

Before we dive into the process, let’s look at some concepts.

Product Catalog Structures

Product catalogs come in different structures, each having their own set of requirements when conducting search. This chapter will focus on explaining the different catalogs available.

Flat Catalog

From a search perspective, the simplest product catalog is a “flat catalog.” It only contains products, and the metadata belonging to those products is part of the product itself, as shown in the following code snippet:

 {
   "DocumentId": "https://fashion.coveodemo.com/pdp/887056_35F",
   "cat_color": "Indigo",
   "cat_discount": 22.0,
   "cat_gender": "Women",
   "cat_mrp": 110.0,
   "cat_rating_count": 79,
   "cat_retailer": "Fendi",
   "ec_brand": "Banana Republic",
   "ec_brand_name": "Fendi",
 },

A search interface for a flat catalog could look like this:

An illustration of a flat catalog
This illustration of a flat catalog shows facets, sorting, and search results all executed on the main product list.

As shown above, facets, sorting, and search results are all executed on the main product list. Most search engines can handle these flat catalogs.

Catalog with Variants

More complicated product catalogs are those that also contain product variants, like blue shoes in size 10 and medium width only, but size 12 in medium and large widths. The complication lies in the fact that you want to show product information in your main search results page, while using the metadata for size and color derived from the variant information for search components like facets.

Your variant information would look like:

 {
   "DocumentId": "https://fashion.coveodemo.com/pdp/887056_35F?sku=887056_35F_24",
   "DocumentType": "Variant",
   "ObjectType": "Variant",
   "cat_size": "24",
   "cat_size_type": "Regular",
   "ec_product_id": "887056_35F",
   "ec_variant_sku": "887056_35F_24",
   "permanentid": "887056_35F_24",
   "title": "Skinny Zero Gravity Stay Blue Ankle Jean (24)"
 },

Some search vendors will suggest creating a huge index out of all the possible variants by flattening down the data. They merge the variant information with the product information in the index. This can quickly become unwieldy and bloat your index. For example, if you have 50,000 products, with 10 variants on five parameters each you would have 2,500,000 products in the index. If your catalog grows even more, it is almost impossible to index all of your products—let alone estimate the time to index such a huge amount of data.

Instead, Coveo stores the products and the variants in separate entities. You’d create 50,000 products and 500,000 variants in the index. The index is capable of handling more product information and indexing all of it in a short amount of time is still feasible.

A search interface for a catalog with variants could look like this:

An illustration shows how different aspects of a product catalog are exposed on a search template (search results page)
In a catalog with many variants, navigating product options can quickly grow complex.

In the above design, you can see that we have a list of products. When variants are stored at the product level, you would see all the possible products for every variant. For example: “puma sneaker” would show a list of 10 the same results (because there are 10 variants defined). This doesn’t create a great ecommerce experience.

What you do want is to show distinct products, while being able to filter by the variant metadata. To do this, you need nested queries. A nested query takes the facet selection into account, finds the associated products and shows only those products. It’s equivalent to a join operation in SQL.

Catalog With Variants and Grouping

There are even more complex catalogs available. One is a catalog that contains not only variants, but also product groups. For example: Shirts in different colors – that each have their own SKU and are available in different sizes. 

In the search interface, you would want to show the first product with the other colors as properties in the search results, as shown below:

An illustration shows how groupings and variants are used in different areas of a search template (search results page)
Product grouping can add another layer of complexity to product catalogs that impacts a user’s search experience.

Your ecommerce search engine needs to be able to support this kind of grouping. If it can’t, you must group the results yourself on rendering time, which will impact load time and the overall user experience.

Product Catalog Search Architecture

Architecture for an ecommerce storefront can be quite generic. For our example, we’ll use a Stream API to push product information from the store catalog to the search index.  

Our Headless frontend framework retrieves results from the Search API and sends analytics requests to the Analytics API. Analytics are important, as most machine learning engines learn from the user behavior on your ecommerce site.

An illustration of an ecommerce setup in the Coveo cloud platform

As you can see in the above architecture, you can create the user interface using several different technologies. 

To get started faster, you can use a Headless framework, or you could use Atomic components. The Headless framework and the Atomic components will both send Analytics requests. If you want to use the Search API directly, then you must use the Analytics API to send it yourself. In addition to standard analytic tracking, ecommerce solutions require some advanced tracking for all non-search related events. At Coveo, we use the collect commerce events endpoint to track customer journeys.

Make sure that your index technology can support all of the above catalogs. For example, if you want to display data from your variants as facets, but you only want to show products in the search results, is that possible? Or, can you group related products that belong to the same group?

At Coveo, we use nested queries and result folding to achieve these scenarios. Configuring these methods can quickly become complicated. In our platform, you can easily set these up by defining your catalog. The platform will then build your needed queries.

Up next: the required steps to make it all to work.

Required Steps

  • Designing your solution’s architecture
  • Indexing your product catalog
  • Configuring the Coveo platform (Query Pipelines, ML Models), discussed in part 2
  • Building the UI, discussed in part 3

Designing Your Solution’s Architecture

The first step in any search implementation is to think about your architecture. Which use cases must be supported, which cross-sell features are needed, which data to show, which data to filter on, etc. 

For this step, we’ll focus on search components.

The main use case for our generic ecommerce store is to sell more products by enabling end users to quickly find items they are interested in and cross sell them related products.

Based on this use case, we need the following:

  • Helping end users find products with query suggestions 
  • Presenting popular results first in the search results 
  • Displaying similar viewed, recommended products when viewing a specific product
  • Displaying similar bought, recommended products when viewing a specific product
  • Landing page/homepage that directly shows overall popular products (both viewed as bought)

This is a very high-level idea of what we need to create to meet the needs of our use case. 

Since we are focused on building an ecommerce search experience, we need to think about what metadata to index. 

Identify Your Needed Metadata

For our scenario’s use case, we identified the following metadata.

Data to show at a result level:

  • Brand
  • Product Title
  • Product Price
  • Product Discount Price
  • Available Colors
  • Category

Data to filter on:

  • Category
  • Color
  • Size (Variant)
  • Size Type (Variant)
  • Brand
  • Gender
  • Fit

Data to sort on:

  • Price

Once we know the metadata, we can index it. For some best practices, only add new metadata to your search index when:

  • Creating a facet 
  • Displaying it on a result template
  • Sorting by it
  • Boosting on it

Add all free-text searchable metadata to the data property of the document you are pushing to the Stream API. Data in these properties is automatically free-text searchable and properly stemmed.

The below image shows how the fields from our catalog relate to the fields needed in the UI.

Mapping metadata to the search interface

If your metadata contains multiple values (for example, store prices in a B2B scenario) use dictionary fields.

Dictionary Fields

When you need to support multiple prices and/or metadata for different stores or configurations, you can achieve this by using dictionary fields. For example: there is a price field for a product where the price differs for certain stores.

"my_price": {
       "": 19.36,
       "Store1": 18.22,
       "Store2": 17.34
   },

When you query the data, you need to assign the proper dictionary field setting to receive the correct data from the Search API.

The dictionary field simply behaves as a normal field, but with a possibility to select the exact key you need. In our current setup there are no dictionary fields defined.

Indexing Your Product Catalog

The Coveo Platform supports several connectors for indexing product information. In most ecommerce scenarios, the Stream API is used. 

Format the Catalog Data

We recommend a specific catalog structure to optimize your ecommerce search features use case: 

  • Product
    • ec_product_id
    • ec_product_group_id (if you want to group products)
    • objecttype=Product
  • Variant
    • ec_product_id
    • cc_variant_sku
    • objecttype=Variant

Once your data is properly formatted, you can start pushing it into the Stream API.

Indexing Product Data

To index your product data, you need to use the Stream API. This API requires product catalog information to be formatted in JSON files. 

The catalog’s initial build should contain the catalog in its entirety from beginning to the end. This helps your machine learning engine understand the catalog’s boundaries.

The easiest solution is naming your fields using standard Coveo field names, avoiding any need for manual mapping.

After first navigating to the index directory, you can use the NodeJs CLI to push the example catalog data to your index:

npm install -g coveo-pushapi-cli
pushapi data

The above will push all the JSON files in the data directory using the Stream API. In the github repository instructions are available to upload the demo catalog.

SDKs

The following SDKs are available for pushing to the Stream API:

Updating the Index

When you want to update your index, just upload a JSON file of the new or changed product information. Note that it’s important that you use the ‘stream/update’ call. 

When the ‘DocumentId’ of a product is the same, that existing document will be updated. In the JSON file for the update, you can also add documents to delete.

And that’s it! Your ecommerce search setup is ready to start working with your catalog source.

The next part in this series, Part 2: Configuring the Search Platform, discusses configuring features such as query pipelines and machine learning.