-
Notifications
You must be signed in to change notification settings - Fork 0
Wildcard searching with ngrams
I have read a lot of posts from people wanting to make wildcard searches with Sunspot, and being stopped, simply because the Dismax Query Parser does not (yet) support wildcards (e.g. "sun*" should find "sunspot").
A simple solution lies in using an extra filter factory in your schema.xml. Out of the box, your text field is defined as:
<fieldType name="text" class="solr.TextField" omitNorms="false">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
The filter you should add is the EdgeNGramFilterFactory - like this:
<fieldType name="text" class="solr.TextField" omitNorms="false">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15" side="front"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
You can read a complete explanation of EdgeNGramFilterFactory here. Basically it takes every token and breaks it down into multiple tokens called "n-grams". In the above configuration "sunspot" is broken down into "su", "sun", "suns", "sunsp", "sunspo", "sunspot".
This means that if your search term is "sun", then a document containing "sunspot" will be matched, since this word has also generated the token "sun".
One can also use NGramFilterFactory for substring search instead just pre-/postfix search.
<filter class="solr.NGramFilterFactory" minGramSize="2" maxGramSize="15"/>
Remember to restart the solr-server and reindex after applying this filter.
2011-01-27: This first post in this discussion should be noted: https://groups.google.com/d/topic/ruby-sunspot/9yTr00NCbxc/discussion