Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement concurrent highlighting on multiple threads (#64) #429

Merged
merged 8 commits into from
May 24, 2024

Conversation

jbaiter
Copy link
Member

@jbaiter jbaiter commented May 10, 2024

  • Concurrency is per-(doc, field) combination, i.e. when highlighting 2 field in 10 documents, 20 threads should be used
  • Thread limit and maximum number of queued tasks per thread can be configured with numHighlightingThreads and maxQueuedPerThread parameters

Performance Gains ✨ on a machine with 24 logical cores and enough RAM to hold the index and OCR data in the page cache (i.e. I/O does not play a relevant role):

perfplot_threading

@jbaiter jbaiter linked an issue May 10, 2024 that may be closed by this pull request
@jbaiter jbaiter marked this pull request as draft May 10, 2024 07:26
@jbaiter jbaiter changed the title Draft: Implement concurrent highlighting on multiple threads (#64) Implement concurrent highlighting on multiple threads (#64) May 10, 2024
@jbaiter jbaiter force-pushed the concurrent-highlighting branch from c2fca29 to 51bd67e Compare May 10, 2024 09:19
@jbaiter jbaiter force-pushed the concurrent-highlighting branch from 51bd67e to cd48938 Compare May 10, 2024 09:54
- Concurrency is per-(doc, field) combination, i.e. when highlighting
  2 field in 10 documents, 20 threads should be used
- Thread number and maximum number of queued highlighting tasks can be
  controlled with the new `numHighlightingThreads` and `maxQueuedPerThread`
  attributes on the `OcrHighlightComponent` config.
- Includes a benchmarking suite that runs against the example index
  (`example/bench.py`)
@jbaiter jbaiter force-pushed the concurrent-highlighting branch from cd48938 to 527f91d Compare May 10, 2024 11:10
@jbaiter jbaiter force-pushed the concurrent-highlighting branch from 527f91d to 886d31a Compare May 10, 2024 12:50
jbaiter added 5 commits May 17, 2024 17:25
We were storing the number of matches in a non-thredsafe `HashMap`,
which led to a race condition when writing to it from multiple threads.
We now use a threadsafe `ConcurrentHashMap` to avoid this.

Additionally, we now use `CompleteableFuture`s for waiting on our
asynchronous highlighting threads instead of raw `Future`s.
Now that we support concurrent highlighting, it's no longer needed.
@jbaiter jbaiter marked this pull request as ready for review May 23, 2024 06:29
@jbaiter jbaiter merged commit 8ea2eb7 into main May 24, 2024
6 checks passed
@jbaiter jbaiter deleted the concurrent-highlighting branch May 24, 2024 06:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Multi-threaded highlighting
2 participants