Drupal Search: How to Get Better, More Intuitive Search Results using Search API

May 28, 2018

We’ve recently run into some challenges getting Drupal to return intuitive search results, especially when using the paragraphs module to build pages. This article will discuss configuration options to get search to behave better.

You’ll be able to:

  • Get search to produce more intuitive results;
  • Give content managers control (or at least influence) over the order of search results for certain keywords;
  • Avoid common configuration pitfalls that lead to unexpected search results.

Before You  Begin

Search API

This article assumes you are using the Search API module, which should be your go-to solution for search in Drupal 8. It offers more flexibility than the default Drupal search module.

Solr

Generally speaking, if you have any option at all to use Solr as your search back end, you should do so. Acquia and Pantheon both offer Solr with most of their hosting packages. The outline below will work with either the default Drupal database search or Solr, but Solr will almost always return better, more expected search results, and it supports searching as a phrase using quotes.

Gotcha: Searching phrases using quotes

The default configuration for Search API is to use the “Database search” server, which is a separate module that is packaged with the Search API module in Drupal 8. However, this search server has a fairly significant deficit: It ignores quoted searches. In other words, searching for “Information Technology” returns the same result set as searching Information and Technology as separate keywords. The results don’t prioritize matches when these words appear together in the order of “Information Technology.”

There are currently no plans to implement searching keyword phrases in Drupal database search; you will need Solr or another search server if keyword phrase searches are critical to your use-case.

Prerequisites

This article assumes that you have basic knowledge of installing and configuring Search API, and that you’re looking to optimize your search results. If that’s not the case and you need the fundamentals on setting up Search API, refer to the Getting Started documentation.

  • Install the Search API module
  • Enable the Search API module and the Search API database module (skip the second step if you are using Solr)
  • Add a search API server and index as described in the documentation.
a hand holding a magnifying glass in front of a blurry autumn forest and the glass shows a clear close up of leaves

Steps to Getting Better Results

1. The “Rendered HTML Output” field: Getting search indexer to “see” the entire node

Most of our Drupal 8, Drupal 9, and Drupal 10 sites leverage the paragraphs module to allow flexibility when building pages. However, the paragraphs module presents some additional challenges for search, since the main body copy is no longer isolated to a single WYSIWYG field: The content of the page is made up of many smaller pieces, and it can be tricky to get the search indexer to search all of the content within a node.

Search API offers us a wonderfully simple solution, however, in the form of the “Rendered HTML Output” general field. If you incorporate this field into your index, the indexer will be fed the content of the page as the users sees it, including all paragraph entities and references.

  • To add the “Rendered HTML Output” field to your index:
  • Browse to /admin/config/search/search-api
  • Next to the index you’d like to edit, click “Edit”
  • Click “Fields” in the tabs at the top
  • Click the “Add Fields” button
  • Under the “General” heading, find “Rendered HTML Output (rendered_item)”
  • Click “Add” next to “Rendered HTML Output (rendered_item)”
  • You will see a list of all of the content types available to your search index, with an option to choose a view mode. It is critical that you choose the view mode that most closely represents what the user sees when viewing a page. Usually, this is “default”, so choose “default” for the view mode for each content type, unless you’ve configured a different view mode. DO NOT choose “search results highlight input.” This is a common mistake that will not give you the correct output.
Drupal 8 Search API add rendered HTML output screenshot
  • Click “Save”
  • Click “Done”
  • Click “Save Changes” at the bottom of the screen

2. “Boosting” the title field to provide better relevancy

You almost always want keywords in the title field to be “worth” more than those in the other fields in your content. If the title of the page is “How to create search indexes in Drupal” and someone searches “Drupal search indexes,” you probably want that page to appear first.

“Boosting” a field means it is worth more in the relevancy ranking as compared to other fields. To boost the title field, follow the steps below:

  • Browse to /admin/config/search/search-api
  • Next to the index you’d like to edit, click “Edit”
  • Click “Fields” in the tabs at the top
  • Click the “Add Fields” button
  • Under the “Content” heading, find “Title”
  • Click “Add” next to “Title”
  • Click “Done”
  • Locate the newly added title field in your fields list Change the “Type” from “String” to “Fulltext” (only Fulltext fields can be boosted)
  • A “Boost” dropdown should appear. I like to set the title to the second highest boost, so 13.0. The special keywords field we are going to add in the next step is what I put at 21.
  • Click “Save Changes” at the bottom of the screen
Drupal 8 search API title boost

3. Giving content managers the ability to influence search results

Sometimes, the relevancy ranking just isn’t providing the results ordering that we want, or we want to account for other search terms that might not be exactly what the user typed, but that we know are relevant. The following steps will account for the use cases of:

  • Let’s say we have a page for “Admissions,” which is what we want to show up first if someone searches “Admissions,” but there are other pages that use this keyword so frequently that those are the ones showing up first, leaving the actual “Admissions” page several results down. We want to force the “Admissions” page to be first.
  • We have a page called “Health Services,” and we want to make sure it shows up first when someone searches “Flu Vaccine,” even if “Flu Vaccine” doesn’t appear in the body text, or rarely does.

To accomplish these use cases, we’re going to create a custom field called “search_keywords” and add it to each content types we’re searching. Then we’re going to boost this field to the maximum, so that it has a lot of weight in the search results.

  • Create a new “search_keywords” field and add it to each content type/bundle you are searching (This article assumes you know how to add fields to content types; consult the Drupal docs if not.)
  • Browse to /admin/config/search/search-api
  • Next to the index you’d like to edit, click “Edit”
  • Click “Fields” in the tabs at the top
  • Click the “Add Fields” button
  • Under the “Content” heading, find the “Search Keywords” field you’ve created
  • Click “Add” next to “Search Keywords”
  • Locate the newly added “Search Keywords” field in your fields list
  • Change the “Type” from “String” to “Fulltext” (only Fulltext fields can be boosted)
  • A “Boost” dropdown should appear.
  • Set the boost to 21
  • Click “Save Changes” at the bottom of the screen.

Content managers can now insert words into this field to impact the search result ordering. In our “Health Services” example above, if the content manager adds “Flu Vaccine” to the search keywords field on the Health Services node, that node should appear first or at least very high in the results for “Flu Vaccine.” Adding the words to the search keywords field multiple times will further boost the result, if necessary.

4. Setting Processors and choosing Processor order

The processors you choose and the order is important. You can alter these settings if you need to customize them further, but here is how I generally set my processors:

Preprocessor Index:

  • Ignore case
  • Ignore characters
  • HTML filter
  • Tokenizer
  • Stopwords
  • Stemmer

Preprocess Query:

  • Content Access
  • Ignore case
  • Ignore characters
  • HTML filter
  • Tokenizer
  • Stopwords
  • Stemmer

Postprocess Query:

  • Highlight

I usually leave most of the default settings in place with the exception of “Minimum word length” for the tokenizer, which I set to 2 instead of 3 (this is also discussed below).

5. Setting up your view

Search API integrates with views, which is great because we get all of the flexibility of Views in configuring how search results are displayed. If you need assistance with creating a search view, you can refer to the documentation.

The basic search view should get you what you need, but I like to set up a fields-based view and add the title and excerpt fields to offer a Google-style title and snippet; my configuration usually looks something like this:

Drupal 8 Search API Views fields for excerpt

In rare cases where some results aren’t returning a snippet, I’ll add “Render HTML Output” as a hidden field in my view, in the first position, and truncated to a length of 256 characters. Then, in the Search: Excerpt field, I’ll put {{ rendered_item }} in the “No Results Text” field under “No Results Behavior,” so I always get a snippet of some kind.

Common Pitfalls to Avoid

Minimum word length

We recently had a situation where searching for “UV Lamps” was not returning proper results, and it was because the minimum search length was set to the default of “3” and therefore ignoring the “UV.” To change the default, you’ll need to go to “Processors” in your index, locate the vertical tab for “Tokenizer” at the bottom of the screen, and change  “Minimum word length to index” from 3 to 2.

“Numeric value out of range error" after setting boosts

When using the Database server and setting boosts, you may run into this error. It was partially fixed in the latest version of the Search API module, but can still occur if you’re changing the boosts after you set them initially. If you’re still getting this error even with the latest version of Search API installed, refer to my comment here.

"Retrieve result data from Solr" Causes Partial words to appear in excerpt or missing excerpts

We recently ran into a scenario where our search excerpts were either missing/empty, or they returned only pieces of words. It appeared that we were seeing only the stemmed roots of words, not the whole word, so we’d see partial words in the excerpt. For example, we’d see  “residen life” when searching “residence life”.  It looks like “Retrieve result data from Solr”, which is a setting in Search API > Servers (not the index itself) was causing this issue. The description for that field does note that it might cause unexpected behavior, so I suspect that this option should be unchecked if you’re using the processor order identified above.