0.9.0: Major Performance Improvements
Major performance and stability improvements in this release, upgrading is highly recommended.
Changed:
- Add support for multithreaded highlighting. Uses all available logical CPU cores by default and can be tweaked with the
numHighlightingThreads
andmaxQueuedPerThread
attributes on theOcrHighlightComponent
insolrconfig.xml
. - Removed
PageCacheWarmer
, no longer needed due to multithreading support. - Completely refactored, simplified and optimized I/O stack to reduce number of file system reads and allocations/data copies during highlighting, accounting for a significant performance improvement over previous versions (4-8 times faster in a synthetic benchmark that was not I/O-bound)
- We no longer memory-map files for reading. Benchmarking revealed that it did not improve performance with the new I/O stack (probably due to the reduced amount of actual reads), on the contrary, performance was improved for many concurrent queries. A huge drawback of the memory-mapped approach was that in the presence of I/O errors like disappearing mounts, truncated files, etc, the JVM could simply crash (due to the kernel sending a
SIGBUS
signal when encountering an I/O error). - When locating breaks in the forward direction, we used to put the break point at the end of the limiting element opening tag. With the new implementation, the break point is now at the start of the limiting tag open element, i.e. no part of the limiting element is contained in the created section. This leads to a small change in the scores assigned to passages (since BM25 uses the length of the scored content in its calculations).
Fixed:
- When using source pointers with multiple files, the plugin no longer leaks file descriptors. We previously didn't close the currently opened file when opening the next one.