What is it about
How to do a faceted search in an online store? How are values generated in faceted search filters? How does choosing a value in a filter affect values in adjacent filters? In search of answers, I reached the fifth page of Google search results. I did not find exhaustive information, I had to figure it out myself. The article describes:
- how the UI reacts when the user uses filters;
- algorithm for generating filter values;
- ElasticSearch query templates and index structures with explanations.
There are no ready-made solutions here. You cannot copy and paste. To solve your own problem, you have to delve into.
Concepts to make it clear
Full-text search - search for products by word or phrase. For the user, this is a field for entering text with the "Find" button, which is available on any page of the site.
Faceted search - searching for a product by several characteristics: color, size, memory size, price, etc. For the user, this is a set of filters. Each filter is associated with only one characteristic and vice versa. Filter values are all possible values for a characteristic. The user sees filters on the section page, category, on the page with full-text search results. When the user selects a value, the filter is considered active.
Faceted filter behavior
In short, a filter filters products and filters selection options in other filters.
Filters products
It's easy with this. The user selected:
- one value, sees products that match the value;
- several values in one filter, sees products that match at least one;
- values in several filters, sees products that match the value from each filter.
In terms of Boolean algebra: there is a logical "AND" between the filters, a logical "OR" between the values in the filter . Simple logic.
Filters choices in other filters
"Well ... what options are there - displayed, what not - hidden" - this is how the business describes the behavior of filters. Sounds logical. In practice, it works like this:
- Go to the Phones section, see filters by characteristics: Brand, Diagonal, Memory. Each filter contains values.
- . . 1.
- . , . , 2.
- . . , 3.
- «» . 3 ..
The number of filter values depends on the number of products: the more products with different characteristic values, the more values in the filter. The user reduced the number of products in the selection for the remaining filters when they selected a brand. This resulted in an update of the lists of values.
This gives rise to a universal rule: filter values are retrieved from the selection of products, which is formed by the rest of the active filters.
Each active filter has its own selection of products.
If we have N filters and:
- are not active, then the sample is general. It is the same for all filters and matches the search results;
- M is active, and M <N, then the number of samples is M + 1, where 1 is the sample on which all active filters are applied. It is the same for all inactive filters and coincides with the search results;
- active M, and N = M, then the number of samples N. Each filter has its own sample.
Eventually, when the user selects a facet filter value, the following happens:
- a search selection of goods is formed;
- the values for inactive filters are retrieved from the search selection;
- for each active filter, a new sample is formed and new values of active filters are extracted from it.
The question arises - how to implement this in practice?
Elasticsearch (ES) implementation
Product characteristics are not universal, so you won't find a ready-made index structure here for storing products or ready-made queries. Instead, there will be links to documentation explaining how to build the "correct" indexes and queries yourself. “Correct” - based on my experience and knowledge.
"Correct" types of text boxes
In ES, we are interested in 2 data types:
- text for full text search. Fields of this type cannot be used for exact comparison, sorting, aggregation;
- keyword for strings that are involved in the operations of exact comparison, sorting, aggregation.
ES parses the values in the text field and builds a dictionary for full-text search. The values in the keyword field are indexed as received. Aggregation and sorting is available only for keyword fields.
The user uses characteristics in both cases: in full-text search and through filters. ES doesn't allow you to assign 2 types to a single field, but offers other solutions:
fields
PUT my_index
{
«mappings»: {
«properties»: {
«some_property»: {
«type»: «text», // 1
«fields»: { // 2
«raw»: {
«type»: «keyword»
}
}
}
}
}
}
- we declare product characteristics as a field of type text .
- using the fields parameter, create a child virtual field of type keyword . Virtual, because it is present in the index and not in the product description. ES automatically saves the data to the child field as it received.
So for every characteristic.
Queries for exact comparison, sorting, and aggregation operations must use a child virtual field of type keyword . In the example, this is some_property.raw . For text search - parent.
copy_to .
PUT my_index
{
«mappings»: {
«properties»: {
«all_properties»: { // 1
«type»: «text»
}, «some_property_1»: {
«type»: «keyword»,
«copy_to»: «all_properties» // 2
},
«some_property_2»: {
«type»: «keyword»,
«copy_to»: «all_properties»
}
}
}
- Create a virtual field with the text type in the index .
- Declare each characteristic as a keyword with the copy_to parameter . Specify the virtual field with the parameter value. ES copies the value of all characteristics to the virtual field when the document is saved.
For the operations of exact comparison, sorting and aggregation, you need to use the characteristic field, for text search - a field with the values of all characteristics.
Both approaches create additional fields in the index that are not present in the original document structure. Therefore, to create a query, you need to know the structure of the index.
I prefer the copy_to option . Then, to build a full-text search query, it is enough to know one field with a copy of the values of all characteristics.
Inquiries
To search for products
Let's assume that the index structure is the same as in the copy_to variant . For full-text search in ES, the match construct is used , for comparison with the values of faceted filters - terms query . boolean query combines constructs into one query. It will be something like this:
{
«query» : {
«bool»: {
«must»: {
«match»: {
«virtual_field_for_fulltext_searching»: {
«query»: «some text»
}
}
},
«filter»: {
«must»: [
{«property_1»: [ «value_1_1», …, «value_1_n»]},
…
{«property_n»: [ «value_n_1», …, «value_n_m»]}
]
}
}
}
}
query.bool.must.match main full text search
query query.bool.filter filters to refine the main query. must inside means logical "and" between filters. The array of values in each filter is a boolean or.
For filter values
The terms aggregation clause groups products by characteristic value and calculates the quantity in each group. This operation is called aggregation. The difficulty is that for each active filter, the terms aggregation must be performed on a selection of goods formed by other active filters. For inactive filters - on a selection that matches the search results. The filter aggregation construct allows you to create a separate selection for each aggregation and "pack" operations into one query.
The request structure will be like this:
{
«size»: 0,
«query» : {
«bool»: {
«must»: {
«match»: {
«field_for_fulltext_searching»: {
«fuzziness»: 2,
«query»: «some text»
}
}
},
«filter»: {
}
}
},
«aggs» : {
«inavtive_filter_agg» : {
«filter» : { …
},
«aggs»: {
«some_inavtive_filter_subagg»: {
«terms» : {
«field» : «some_property»
}
},
...
«some_other_inavtive_filter_subagg»: {
«terms» : {
«field» : «some_other_property»
}
}
}
},
«active_filter_1_agg» : {
«filter»: {
… },
«aggs»: {
«active_filter_1_subagg»: {
«terms» : {
«field»: «property_1»
}
}
}
},
…,
«active_filter_N_agg» : {
«filter»: {
…
},
«aggs»: {
«active_filter_N_subagg»: {
«terms» : {
«field»: «property_N»
}
}
}
}
}
}
query.bool - main query, filtering operations are performed in its context. It consists of:
- match - request for full-text search;
- filters - filters by characteristics that are not associated with facet filters and must be present in any subset. This can be a filter by in_stock, is_visible, if you always want to show only products in stock or only visible ones.
aggs.inavtive_filter_agg - aggregation for inactive facet filters consists of:
- filter - conditions by characteristics that are formed by active facet filters. Together with the main query, a selection of goods is formed, on which the child aggregations of this section are performed;
- aggs is a named aggregation object for each inactive filter.
aggs.active_filter_1_agg - aggregation of getting the values of the first of the active facet filters. Each design is associated with one facet filter. Consists of:
- filter - conditions by characteristics that are formed by active facet filters, except for the current one. Together with the main query, it forms a selection of goods, on which the child aggregation of this section is performed;
- aggs - an object from one aggregation according to the characteristic of the currently active facet filter.
It is important to specify "size": 0 , otherwise you will get a list of products matching the main query without aggregations.
Eventually
Received two requests:
- for search results, returns products to display to the user;
- for filter values, performs aggregation, returns filter values and the number of products with that value.
Each request is self-contained, so it is best to execute them asynchronously.
PS I admit that there are more "correct" approaches and tools for solving the problem of faceted search. I would be grateful for additional information and examples in the comments.