You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
With its very versatile support for hybrid search (combining "classic" Lucene-like term search with vector search), Vespa is becoming very popular in many contexts. It would be great to be able to use OCR highlighting with it, at least for the term-search.
Vespa supports using CharFilter implementations from Lucene, so at least the indexing side should be a simple matter of writing the appropriate wrappers to expose the functionality to it.
For rendering the responses, Vespa supports custom "Result Renderers", with these it should be simple to add a ocrHighlighting field to the response, assuming the Result object has offset information associated with it. I haven't yet found out how to access this information, but it's definitely available at least internally for the highlighting feature for fields and summaries (called "bolding" in Vespa). Hopefully there's a way to access it from the Renderer implementation.
Looks like it's going to be more complicated: The Java-side of Vespa does not have access to offset information, this is all handled in the C++ backend, and then passed on to the Java side as a text sequence with highlight markers, i.e. the offset information is lost. From what I could gather from the documentation, the Java side of Vespa is the only place where we can add extra functionality, so a straight 1:1 port of the approach used for Solr won't work in Vespa.
The text was updated successfully, but these errors were encountered:
With its very versatile support for hybrid search (combining "classic" Lucene-like term search with vector search), Vespa is becoming very popular in many contexts. It would be great to be able to use OCR highlighting with it, at least for the term-search.
Vespa supports using
CharFilter
implementations from Lucene, so at least the indexing side should be a simple matter of writing the appropriate wrappers to expose the functionality to it.For rendering the responses, Vespa supports custom "Result Renderers", with these it should be simple to add a
ocrHighlighting
field to the response, assuming theResult
object has offset information associated with it. I haven't yet found out how to access this information, but it's definitely available at least internally for the highlighting feature for fields and summaries (called "bolding" in Vespa).Hopefully there's a way to access it from theRenderer
implementation.Looks like it's going to be more complicated: The Java-side of Vespa does not have access to offset information, this is all handled in the C++ backend, and then passed on to the Java side as a text sequence with highlight markers, i.e. the offset information is lost. From what I could gather from the documentation, the Java side of Vespa is the only place where we can add extra functionality, so a straight 1:1 port of the approach used for Solr won't work in Vespa.
The text was updated successfully, but these errors were encountered: