You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Maybe we could somehow get the serialization to work for the JVM, but I doubt it would be portable to JS and Native.
It's really just a road we don't want to go down.
Why is Analyzer a part of the index at all?
It ended up there during the work to add support for lower casing queries.
In order for "bad" to match "Bad" we use an Analyzer with lowerCase = true at both index time and query time.
This is easy to accomplish if the Analyzer is just baked right into the index.
The path forward is likely to:
remove Analyzer from the index
continue to require an Analyzer at index build time, just don't save it in the index
require queries be analyzed outside of the index before executing them
On this last point, of requiring queries be analyzed, this get's us closer to what Lucene does, which is require an Analyzer for query parsing. I think this is probably how we should do things as well. The end solution here would mean that Lucille needs some notion of analysis as well. Today it basically just does white space tokenization, baked right in. "fast cast" get's parsed as two TermQs.
I think that's good default behaviour for Lucille. So we really just need some way to provide a different analyzer if desired.
The text was updated successfully, but these errors were encountered:
As an intermediate step I am thinking of writing some sort of QueryAnalyzer that we configure and then it ultimately provides the String => Query function we use at query time.
Something like:
we could try following in elasticsearch's path and serialize a description of the analyzer
with the analyzer builders, users could use the builder to construct the analyzer and then we could serialize the description (so the user doesn't have to write json)
we can steal leverage the work / naming decisions elasticsearch has already made here
with some consideration for making clear what keys are special and which ones are just random words
I went to write the
Codec
forMultiIndex
and ran into a problem.Currently
MultiIndex
looks like:With the issue being the inclusion of
Analyzer
.We can't really serialize
Analyzer
because it contains a function:Maybe we could somehow get the serialization to work for the JVM, but I doubt it would be portable to JS and Native.
It's really just a road we don't want to go down.
Why is
Analyzer
a part of the index at all?It ended up there during the work to add support for lower casing queries.
In order for
"bad"
to match"Bad"
we use anAnalyzer
withlowerCase = true
at bothindex
time and query time.This is easy to accomplish if the
Analyzer
is just baked right into the index.The path forward is likely to:
Analyzer
from the indexAnalyzer
at index build time, just don't save it in the indexOn this last point, of requiring queries be analyzed, this get's us closer to what Lucene does, which is require an Analyzer for query parsing. I think this is probably how we should do things as well. The end solution here would mean that Lucille needs some notion of analysis as well. Today it basically just does white space tokenization, baked right in.
"fast cast"
get's parsed as twoTermQ
s.I think that's good default behaviour for Lucille. So we really just need some way to provide a different analyzer if desired.
The text was updated successfully, but these errors were encountered: