This is the first article in a series where I will show you how you can get the most out of your content searches using advanced Lucene queries. If you're like me, you've probably been using Lucene for exact matches and maybe date range in more complex cases. But today, I will show you how fuzzy searches can provide powerful results with real-world scenarios.
What Are Fuzzy Searches?
Fuzzy searches allow you to find terms that are similar to a given search term, even if there are typos, misspellings, or slight variations. This is particularly useful for handling human errors and inconsistent data entry.
Example 1: Basic Fuzzy Search
Suppose you have a catalog of tech products and want to find all records containing "laptop" in the title. However, you know that some product titles may have misspellings, such as: “labtop”. How can you get all the desired results, including the misspelled ones? This basic fuzzy search would cover those cases:
JSON Query
{
"query": {
"bool": {
"must": \[
{
"term": {
"contentType": "Product"
}
},
{
"fuzzy": {
"title": {
"value": "labtop",
"fuzziness": "AUTO"
}
}
}
\]
}
}
}
Equivalent dotCMS Query Tool Syntax
+contenttype:product +title:labtop~2
The query will match results such as:
Product 1: “Dell Laptop Computer”
Product 2: “HP Labtop Bundle with Mouse”
Product 3: “Microsoft Surface Laptopp”
Product 4: “ASUS laptop 2023 Edition”
How Fuzzy Search Works
Notice that I used fuzziness=AUTO. It means that the fuzzy matching behavior will adapt based on the length of your search term "labtop", which affects how your search results are determined.
Here's how fuzziness=AUTO works:
For short terms (0-2 characters): No fuzziness is applied (must be an exact match)
For medium terms (3-5 characters): Fuzziness of 1 is applied (1 edit allowed)
For longer terms (>5 characters): Fuzziness of 2 is applied (2 edits allowed)
Since "labtop" is 6 characters, Opensearch would automatically apply a fuzziness value of 2. This means:
Words that require up to 2 character edits (insertions, deletions, substitutions, or transpositions) to match "labtop" will be included in the results
Matches include:
"laptop" (1 edit: swap 'b' for 'p')
"labptop" (1 edit: add 'p')
"laptops" (2 edits: swap 'b' for 'p' + add 's')
If your search returns too many irrelevant results, you could explicitly set fuzziness: 1 to only allow 1 character edit, making the matching more strict. With fuzziness: 1, “Microsoft Surface Laptopp” would be excluded from the list (as it requires more than one edit).
Why This Matters
The impact on your search results is pretty significant:
Broader matching: You'll get more results than with exact term matching
Spelled variants: Captures common misspellings of "laptop" (which seems to be what you're looking for)
Relevance scoring: Closer matches (fewer edits) will score higher in results
Performance balance: AUTO provides a good balance between matching flexibility and query performance
Example 2: Fuzzy Search with Proximity and Boost
Now let's look at something a bit more advanced. This approach is super helpful when you want to consider misspelled terms (fuzzy search) but at the same time, care about the relevance (boost) of the results.
For this example, we will use a “Blog” content type with a title and a searchable field whose variable name is “blogContent” and 4 pieces of content with the following entries:
Blog | Blog Title | Blog Content |
---|---|---|
1 | “Digital Marketing Strategies for 2025” | “Our marketing team has developed new approaches …” |
2 | “Content Creation” | “A good content marketing approach can increase engagement.” |
3 | “SEO Best Practices” | “The best marketing strategy includes optimizing for search engines.” |
4 | “It shouldn't match” | “marketeers are doing a key job” |
Having these examples, I can run a query:
That filters “Blog” posts
With two misspelled terms "markting" (with higher relevance in searches) OR "stratgy"
JSON Query
{
"query": {
"bool": {
"must": \[
{
"term": {
"contentType": "Blog"
}
},
{
"bool": {
"should": \[
{
"fuzzy": {
"blog.blogContent": {
"value": "markting",
"boost": 2
}
}
},
{
"fuzzy": {
"blog.blogContent": {
"value": "stratgy"
}
}
}
\],
"minimum\_should\_match": 1
}
}
\]
}
}
}
Equivalent dotCMS Query Tool Syntax
+contentType:Blog +(blog.blogcontent:markting~^2 blog.blogcontent:stratgy~)
When you run this query, you'll notice something interesting: Blog 3 has precedence in the results because it includes both terms "markting" and "stratgy". Meanwhile, Blog 4 wasn't included in the results at all.
Wrapping Up
With these examples, you can create more powerful searches that are resilient to user input errors and at the same time, you can customize them to produce better-ranked results that actually matter to your users
Stay tuned! In the following articles, we will be covering other query examples such as wildcards, metadata fields, and relationships.