I started blogging over 15 years ago, and the StarCIO blog, Social, Agile, and Transformation, has over 500 published posts covering digital transformation, agile ways of working, data-driven practices, and other topics. My regular readers have been asking me to provide a search capability that also includes articles I’ve written on other sites, including CIO.com and InfoWorld.
I have lots of experience implementing search engines going back to the 1990s. In my new book, Digital Trailblazer, I tell stories of developing a search SaaS for newspapers on the Alta Vista search engine, then building search for BusinessWeek.com and other B2B websites.
So, I hope implementing a customer-facing search experience today should be easier because of the availability of cloud infrastructure, APIs to most SaaS platforms, and advances in machine learning.
I can configure Solr and have a search capability running in a few days, right?
Evaluate Speed to Value When Reviewing Search Experience Platforms
Well, I’m not a Solr expert, and before I seek help implementing any technology, I do my homework. Does it do what I need it to do? Is it easy, inexpensive, and fast to implement? Will it be straightforward to maintain, especially to support growth and new use cases?
From my vantage point, “easy” is relative. Maybe implementation is easy if I had a team of developers, a friend who is a Solr expert, or knew a leading managed service provider that I could rely on for fast and inexpensive expertise. But with Solr, I have to research solutions, make architecture decisions, consider implementation partners, and configure a platform before I can validate ease of use and, more importantly, search relevancy.
I suspect I would have to take similar steps if I considered other Apache Lucene implementations. For example, I could consider Elasticsearch but would then be on a platform that’s no longer under the Apache Version 2.0 (ALv2) license.
My objectives include developing a useful search experience for end-users, evaluating speed to value in deploying a minimally viable product, and incrementally improving search capabilities without high costs. To do that, I find several upfront decisions and configurations, and here are the research and implementation steps I have to consider.
1. Investigate and Validate Deployment Options
I generally like to see flexible deployment choices, but I may need an experienced Solr architect to propose the best options for my needs. Here are a few they are likely to consider and present to me:
- Do it yourself by following deployment options on the Apache Solr site, including options for Solr in Docker and Solr on HDFS. You’ll also need to follow the steps to configure the JVM, enable backups, and set up monitoring.
- Instead of doing all the work yourself, you can search for a Solr managed service provider and go through a due-diligence process to ensure they meet service level, security, data privacy, scalability, performance, pricing, integration, and other requirements.
- If you want to host Solr on AWS, you can follow the steps to deploy SolrCloud on EC2 or consider scaling Solr on Kubernetes.
- You can also review Solr options on the AWS Marketplace or the Azure Marketplace.
These are all great options, but I’m looking for a simple, easy button to a search platform with cloud agility. I want high reliability and a fast search experience for my users, and I may also want multiple region deployments in the future.
2. Load and Index Content from CMS and SaaS
So after the infrastructure is selected, what’s next?
Loading data has many options and tools depending on where and the type of data that’s needed. If all the data is in a single content management system (CMS), then you might be in luck if there’s a Solr plugin for indexing. For example, Acquia, a Drupal service provider, has an Apache Solr Search Module for content it already manages. Other CMS and Solr service providers may have tools to help connect to data sources and index content and unstructured data.
But if you’re indexing multiple sources, chances are you’ll be using Solr and other tools to help extract, transform, or load the content. Solr has tools like Post, a UNIX tool for indexing web pages, and Solr Cell with Apache Tika to index documents.
In my case, I might be able to use a commercial connector to index some of my SaaS, and for example, there’s one for Salesforce and Atlassian Confluence that I can test out. I’ll have to determine whether they load in all the required fields and how to index new and updated content incrementally. But I can’t find a list of simple SaaS connection tools like other search platforms offer.
3. Query the Core and Evaluate Result Relevancy
So far, I’ve described the high-level steps to configure Solr and what I can easily ascertain from consulting with an architect, reviewing docs, and watching tutorial videos. I’m now ready to run some queries and determine my options for creating a delightful search experience for my end-users.
Solr’s administrative tool has a query interface to run basic searches, including query events, filter queries, and facets. Older Solr versions come with Solr Velocity Search UI, but Solr 9.0, the latest version as of this writing, says it’s deprecated and refers me to Solritas, a third-party plugin.
Now there are many flexibilities to configure queries in Solr. I can review faceting versus JSON facets, a MoreLikeThis related search function, a spatial search method, and other capabilities. But I’ll need a developer to work through these details and determine the best options for my use cases.
Again, I am looking for an easier and faster button for intelligent search.
After connecting to data sources, I want to rapidly test queries, develop minimally viable applications using low-code search, and use headless search when I need tailored experiences. I also want easy ways to investigate machine learning models, test natural language processing capabilities, and personalize search experiences. I expect some easy search analytics tools so that I can learn what users are searching for and what actions they take after reading one of my articles.
Now I can see why some devops teams want to stick with Apache Solr, but I seek an easier implementation option. Larger enterprises with a technical debt of information silos and digital transformation goals to improve customer-facing, customer support, and employee search experiences should revisit their enterprise tech strategy.