Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MB-19243: Detect fuzziness automatically based on term length #2060

Merged
merged 3 commits into from
Nov 20, 2024

Conversation

CascadingRadium
Copy link
Member

@CascadingRadium CascadingRadium commented Aug 2, 2024

  • The following queries can now automatically detect fuzziness based on the length of the terms:
    • Match Query
    • Fuzzy Query
    • Match-Phrase Query
    • Multi-Phrase Query
    • Phrase Query
  • In these queries, each term (whether in a multi-term query like Match or Phrase, or in a single-term query like Fuzzy can have its own edit distance based on its length. The edit distance is calculated as follows:
    • For terms with 1 or 2 characters: edit distance = 0 (exact match)
    • For terms with 3, 4, or 5 characters: edit distance = 1 (fuzzy match)
    • For terms with more than 5 characters: edit distance = 2 (fuzzy match)
  • This feature can be enabled using the <query>.SetAutoFuzziness(<bool>) API.
  • Additionally, we've extended the functionality to query JSON parsing. You can specify fuzziness as either "auto" or a static value in the JSON query. Both formats are valid:
  1. With auto fuzziness:
{
  "match" : "lorem",
  "field" : "bleve"
  "fuzziness" : "auto"
}
  1. With static fuzziness:
{
  "match" : "lorem",
  "field" : "bleve"
  "fuzziness" : 2
}

When unmarshalled, the query will correctly apply the chosen fuzziness method.

  • Fixed a bug where the code incorrectly returned an error message saying fuzziness exceeds maximum when using a fuzzy searcher with fuzziness = 0. Instead, a term searcher is now returned in this case.

@CascadingRadium CascadingRadium added this to the v2.4.3 milestone Aug 2, 2024
@CascadingRadium CascadingRadium self-assigned this Aug 2, 2024
@CascadingRadium CascadingRadium changed the title MB-19243: Auto Fuzzy support MB-19243: Detect fuzziness automatically based on term length Aug 2, 2024
@abhinavdangeti abhinavdangeti modified the milestone: v2.4.3 Aug 5, 2024
@abhinavdangeti abhinavdangeti added this to the v2.5.0 milestone Sep 18, 2024
@abhinavdangeti abhinavdangeti removed the request for review from moshaad7 October 17, 2024 17:49
abhinavdangeti
abhinavdangeti previously approved these changes Nov 13, 2024
@CascadingRadium
Copy link
Member Author

CascadingRadium commented Nov 14, 2024

force pushed a rebase
please review again
thanks

@CascadingRadium CascadingRadium merged commit 3a21667 into master Nov 20, 2024
9 checks passed
@CascadingRadium CascadingRadium deleted the autoFuzz branch November 20, 2024 17:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants