How ElasticSuite’s Advanced Analysis Module Handles Compound Words
Anyone running an ecommerce site in German knows the issue: compound words break search relevance.
German product names frequently combine multiple meaningful words into a single long token:
- • Silikonvollbesatzreithose
- • Winterreithose
- • Ultraschallgerät
For users, these are clearly composed of smaller concepts.
For a search engine, they are often just one indivisible string.
This is known as the Donaudampfschiff issue, because an extreme example of this is the word Donaudampfschifffahrtselektrizitätenhauptbetriebswerkbauunterbeamtengesellschaft
Without proper linguistic handling, you typically end up with one of three approaches:
- • Maintaining a large number of synonym rules
- • Using aggressive n-grams (and introducing significant noise)
- • Accepting incomplete results and missed products
ElasticSuite Premium introduces a more robust solution.
Advanced Analysis & Word Decompounding
The new Premium module:
smile/module-elasticsuite-advanced-analysis
adds support for Hyphenation Word Decompounding, specifically designed to handle Germanic languages.
Instead of relying on character-level matching, the engine can now:
- • Split compound words into meaningful subwords
- • Match those subwords independently
- • Improve recall without degrading precision
The decompounder is injected into the text analyzers (standard, standard_edge_ngram, shingle) before or after the stemmer, depending on configuration.
It relies on two elements:
- • An XML file containing hyphenation patterns for the language
- • A whitelist dictionary defining valid generated words
Configuration is available at store scope in:
Elasticsuite → Analyzers Settings → Hyphenation Words Decompounder
Real-World Results on a German Ecommerce Site
A customer recently deployed the module to staging and performed a full reindex.
Previously, they relied on partial search using n-grams. After enabling the decompounder, they switched the product name field to the standard analyzer.
Here are some comparative results, Before vs After
| Search term | AFTER | BEFORE |
|---|---|---|
| schall | 4 results (all Ultraschall products) | 60 results (ngram noise) |
| ultraschall | 104 results | 3 results |
| reithose | 3,000 results | 1,014 results |
| hosen | 1,046 results | 906 results |
| futter | 645 results | 300 results |
What This Shows
Precision improvement
Searching for “schall” previously returned 60 results due to character-level n-gram matches.
With the decompounder enabled, it now returns only 4 results — all genuinely related to Ultraschall products.
Noise is dramatically reduced.
Recall improvement
Searching for “ultraschall” previously returned only 3 results.
After enabling the decompounder, it returns 104 results.
The engine now understands compound structure instead of treating it as a single opaque word.
Natural compound matching
Searching for “reithose” now correctly matches products such as:
- • Silikonvollbesatzreithose
- • Winterreithose
No synonyms were added.
No manual rule maintenance was required.
Why This Matters
German users naturally search with:
- • Short meaningful words
- • Full compound names
- • Partial compound concepts
If the search engine cannot decompose compounds properly, relevance suffers. Merchants compensate with complex synonym lists or overly permissive analyzers, both of which increase maintenance and reduce precision.
By handling compounding at the analysis level, ElasticSuite improves both recall and precision while simplifying configuration.
Part of ElasticSuite Premium
The Advanced Analysis module is available as part of ElasticSuite Premium.
For merchants operating in German-speaking markets, this feature provides a structural improvement to search quality — not a workaround, but a linguistic solution built into the indexing layer.
If you want to see it in action :