Web Search - Elasticsearch

An introduction to Product 360 - Web Search and how it works with Elasticsearch.

Product 360 - Web Search uses Elasticsearch 7.2.x. Please take care that most of the information about Elasticsearch are related to version 7.x and higher.

Introduction

Elasticsearch is a search engine based on the Lucene library. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents. Elasticsearch is developed in Java.

It has the following features:

  • Uses the Lucene library for full-text search

  • Faceted navigation

  • Query language supports structured as well as textual search

  • Flexible relevance - boost through function queries

  • JSON output formats over HTTP

  • Kibana - administration interface

  • Multi languages support

  • Multi-value fields

  • Extensible through plugins

  • Caching - queries, filters, and documents

Tolerant Searches with Elasticsearch

Elasticsearch provides capability to perform tolerant searches like Google does.

Examples:

Elasticsearch Text Analysis

Given word

Result

Ngram

election

elec, lect, ecti, ctio,tion

Stemmer, Snowball

work

works, working, worked, ...

Internal Structure

Elasticsearch and the underlying Lucene framework uses an index to perform fast full-text searches.The concept is to ensure to get fast search response. The index takes more time to build.

Elasticsearch index is also called an inverted index, because it inverts a page-centric data structure (page -> words) to a keyword-centric data structure (word -> pages).

In database terminology, a Elasticsearch index corresponds to a table, a Elasticsearch document corresponds to a table row, and a Elasticsearch field corresponds to a table column.

images/download/attachments/239797671/es_indx_doc_struc.png

The index data are stored as de-normalized documents. Each documents contains fields which contains all necessary Product 360 data.

Web Search supported Elasticsearch Field Types

Each field defined in Web Search needs to adhere to any of the following Elasticsearch field types -

  • text

  • keyword

  • long

  • integer

  • double

  • float

Elasticsearch Analyzers, Tokenizers and Token Filters

Elasticsearch Analyzers, Tokenizers and Token Filters are defined in each index configuration file.

These help to perform better searches by indexing unstructured data. Each field can have its own set of analyzers, tokenizers and token filters.

More details on their usage can be found on the here: Text Analysis

Boost Factor

Each field can be assigned with a boost factor to get a better score value and to influence the order of the search result.

Parent-Child Relationship

Elasticsearch supports parent-child relationships using a special join field.

In Web Search, we can have an index containing Product, Variant and Item records' hierarchy using this join.