Skip to content

How ElasticSuite’s Advanced Analysis Module Handles Compound Words

Anyone running an ecommerce site in German knows the issue: compound words break search relevance.

German product names frequently combine multiple meaningful words into a single long token:

  • Silikonvollbesatzreithose
  • Winterreithose
  • Ultraschallgerät

For users, these are clearly composed of smaller concepts.
For a search engine, they are often just one indivisible string.

This is known as the Donaudampfschiff issue, because an extreme example of this is the word Donaudampfschifffahrtselektrizitätenhauptbetriebswerkbauunterbeamtengesellschaft

Without proper linguistic handling, you typically end up with one of three approaches:

  • • Maintaining a large number of synonym rules
    • • Using aggressive n-grams (and introducing significant noise)
    • • Accepting incomplete results and missed products

ElasticSuite Premium introduces a more robust solution.

Advanced Analysis & Word Decompounding

The new Premium module:

smile/module-elasticsuite-advanced-analysis

adds support for Hyphenation Word Decompounding, specifically designed to handle Germanic languages.

Instead of relying on character-level matching, the engine can now:

  • • Split compound words into meaningful subwords
  • • Match those subwords independently
  • • Improve recall without degrading precision

The decompounder is injected into the text analyzers (standard, standard_edge_ngram, shingle) before or after the stemmer, depending on configuration.

It relies on two elements:

  1. • An XML file containing hyphenation patterns for the language
  2. • A whitelist dictionary defining valid generated words

Configuration is available at store scope in:

Elasticsuite → Analyzers Settings → Hyphenation Words Decompounder

Real-World Results on a German Ecommerce Site

A customer recently deployed the module to staging and performed a full reindex.

Previously, they relied on partial search using n-grams. After enabling the decompounder, they switched the product name field to the standard analyzer.

Here are some comparative results, Before vs After

Search termAFTERBEFORE
schall4 results (all Ultraschall products)60 results (ngram noise)
ultraschall104 results3 results
reithose3,000 results1,014 results
hosen1,046 results906 results
futter645 results300 results

What This Shows

Precision improvement

Searching for “schall” previously returned 60 results due to character-level n-gram matches.
With the decompounder enabled, it now returns only 4 results — all genuinely related to Ultraschall products.

Noise is dramatically reduced.

Recall improvement

Searching for “ultraschall” previously returned only 3 results.
After enabling the decompounder, it returns 104 results.

The engine now understands compound structure instead of treating it as a single opaque word.

Natural compound matching

Searching for “reithose” now correctly matches products such as:

  • • Silikonvollbesatzreithose
  • • Winterreithose

No synonyms were added.
No manual rule maintenance was required.

Why This Matters

German users naturally search with:

  • • Short meaningful words
  • • Full compound names
  • • Partial compound concepts

If the search engine cannot decompose compounds properly, relevance suffers. Merchants compensate with complex synonym lists or overly permissive analyzers, both of which increase maintenance and reduce precision.

By handling compounding at the analysis level, ElasticSuite improves both recall and precision while simplifying configuration.

Part of ElasticSuite Premium

The Advanced Analysis module is available as part of ElasticSuite Premium.

For merchants operating in German-speaking markets, this feature provides a structural improvement to search quality — not a workaround, but a linguistic solution built into the indexing layer.

If you want to see it in action :