dot CMS

Supercharge Your Searches Using Lucene Queries in dotCMS | Part 1: Fuzzy Searches

Supercharge Your Searches Using Lucene Queries in dotCMS | Part 1: Fuzzy Searches

Share this article on:

This is the first article in a series where I will show you how you can get the most out of your content searches using advanced Lucene queries. If you're like me, you've probably been using Lucene for exact matches and maybe date range in more complex cases. But today, I will show you how fuzzy searches can provide powerful results with real-world scenarios.

What Are Fuzzy Searches?

Fuzzy searches allow you to find terms that are similar to a given search term, even if there are typos, misspellings, or slight variations. This is particularly useful for handling human errors and inconsistent data entry.

Example 1: Basic Fuzzy Search

Suppose you have a catalog of tech products and want to find all records containing "laptop" in the title. However, you know that some product titles may have misspellings, such as: “labtop”. How can you get all the desired results, including the misspelled ones? This basic fuzzy search would cover those cases:

JSON Query

{  
  "query": {  
    "bool": {  
      "must": \[  
        {  
          "term": {  
            "contentType": "Product"  
          }  
        },  
        {  
          "fuzzy": {  
            "title": {  
              "value": "labtop",  
              "fuzziness": "AUTO"  
            }  
          }  
        }  
      \]  
    }  
  }  
}
Equivalent dotCMS Query Tool Syntax
+contenttype:product +title:labtop~2

The query will match results such as:

  • Product 1: “Dell Laptop Computer”

  • Product 2: “HP Labtop Bundle with Mouse”

  • Product 3: “Microsoft Surface Laptopp”

  • Product 4: “ASUS laptop 2023 Edition”

How Fuzzy Search Works

Notice that I used fuzziness=AUTO. It means that the fuzzy matching behavior will adapt based on the length of your search term "labtop", which affects how your search results are determined.

Here's how fuzziness=AUTO works:

  1. For short terms (0-2 characters): No fuzziness is applied (must be an exact match)

  2. For medium terms (3-5 characters): Fuzziness of 1 is applied (1 edit allowed)

  3. For longer terms (>5 characters): Fuzziness of 2 is applied (2 edits allowed)

Since "labtop" is 6 characters, Opensearch would automatically apply a fuzziness value of 2. This means:

  • Words that require up to 2 character edits (insertions, deletions, substitutions, or transpositions) to match "labtop" will be included in the results

  • Matches include:

  • "laptop" (1 edit: swap 'b' for 'p')

  • "labptop" (1 edit: add 'p')

  • "laptops" (2 edits: swap 'b' for 'p' + add 's')

If your search returns too many irrelevant results, you could explicitly set fuzziness: 1 to only allow 1 character edit, making the matching more strict. With fuzziness: 1, “Microsoft Surface Laptopp” would be excluded from the list (as it requires more than one edit).

80

Why This Matters

The impact on your search results is pretty significant:

  1. Broader matching: You'll get more results than with exact term matching

  2. Spelled variants: Captures common misspellings of "laptop" (which seems to be what you're looking for)

  3. Relevance scoring: Closer matches (fewer edits) will score higher in results

  4. Performance balance: AUTO provides a good balance between matching flexibility and query performance

Example 2: Fuzzy Search with Proximity and Boost

Now let's look at something a bit more advanced. This approach is super helpful when you want to consider misspelled terms (fuzzy search) but at the same time, care about the relevance (boost) of the results.

For this example, we will use a “Blog” content type with a title and a searchable field whose variable name is “blogContent” and 4 pieces of content with the following entries:

Blog

Blog Title

Blog Content

1

“Digital Marketing Strategies for 2025”

“Our marketing team has developed new approaches …”

2

“Content Creation”

“A good content marketing approach can increase engagement.”

3

“SEO Best Practices”

“The best marketing strategy includes optimizing for search engines.”

4

“It shouldn't match”

“marketeers are doing a key job”

Having these examples, I can run a query:

  • That filters “Blog” posts

  • With two misspelled terms "markting" (with higher relevance in searches) OR "stratgy"

JSON Query

{  
  "query": {  
    "bool": {  
      "must": \[  
        {  
          "term": {  
            "contentType": "Blog"  
          }  
        },  
        {  
          "bool": {  
            "should": \[  
              {  
                "fuzzy": {  
                  "blog.blogContent": {  
                    "value": "markting",  
                    "boost": 2  
                  }  
                }  
              },  
              {  
                "fuzzy": {  
                  "blog.blogContent": {  
                    "value": "stratgy"  
                  }  
                }  
              }  
            \],  
            "minimum\_should\_match": 1  
          }  
        }  
      \]  
    }  
  }  
}
Equivalent dotCMS Query Tool Syntax
+contentType:Blog +(blog.blogcontent:markting~^2 blog.blogcontent:stratgy~)

When you run this query, you'll notice something interesting: Blog 3 has precedence in the results because it includes both terms "markting" and "stratgy". Meanwhile, Blog 4 wasn't included in the results at all.

80

Wrapping Up

With these examples, you can create more powerful searches that are resilient to user input errors and at the same time, you can customize them to produce better-ranked results that actually matter to your users

Stay tuned! In the following articles, we will be covering other query examples such as wildcards, metadata fields, and relationships.

Recommended Reading
  • Migrating Your OSGi Plugins to dotEvergreen: Adapting to the New Index API
    24 Mar 26
    Technical Guides

    Migrating Your OSGi Plugins to dotEvergreen: Adapting to the New Index API

    An update on infrastructural changes, information on a breaking change introduced that may affect some plugins, and a migration guide for those affected.

    Fabrizzio

    Fabrizzio Araya

    Software Engineer

  • What Is Rich Text? How It Works in a Headless CMS
    23 Mar 26
    Content Management

    What Is Rich Text? How It Works in a Headless CMS

    What is rich text, and how does it differ from Rich Text Format (.rtf)? Learn how rich text works in content management systems, how headless CMS platforms store it as structured data, and why the format matters for omnichannel delivery.

    Fatima

    Fatima Nasir Tareen

    Marketing Specialist

  • Structured Content for GEO: How dotCMS Powers AI-Ready Digital Experiences
    21 Mar 26
    AI in CMS

    Structured Content for GEO: How dotCMS Powers AI-Ready Digital Experiences

    Discover how dotCMS revolutionizes AI-driven digital experiences with structured content for Generative Engine Optimization (GEO). Learn how our enterprise solution enhances AI visibility, enabling large language models to accurately process and cite machine-readable data. Dive into best practices for creating AI-ready content and explore the benefits of a headless CMS model. Optimize your content for AI discovery and experience seamless omnichannel delivery. Contact us to leverage dotCMS for your AI-powered search needs.

    Fatima

    Fatima Nasir Tareen

    Marketing Specialist

  • AI Content Governance for Content Teams: A Practical Framework
    9 Mar 26
    AI in CMS

    AI Content Governance for Content Teams: A Practical Framework

    Learn why AI content governance is essential for content teams. Discover how to protect brand consistency, reduce legal risk, and manage AI across dozens of sites with dotCMS’s built-in governance tools.

    Fatima

    Fatima Nasir Tareen

    Marketing Specialist

Explore dotCMS for your organization

image

dotCMS Named a Major Player

In the IDC MarketScape: Worldwide AI-Enabled Headless CMS 2025 Vendor Assessment

image

Explore an interactive tour

See how dotCMS empowers technical and content teams at compliance-led organizations.

image

Schedule a custom demo

Schedule a custom demo with one of our experts and discover the capabilities of dotCMS for your business.