forked from nmslib/hnswlib
-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Dynamic ef_search #15
Open
atroyn
wants to merge
15
commits into
master
Choose a base branch
from
anton/dynamic-ef-search-again
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This will require additional work to integrate into Chroma, before we release, since it does incur an API change. I think the right thing to do is to pull this into the monorepo, under a cpp directory. That can then also house our inference execution environment if we use cpp for it. Alternatively, we can leave the original API calls and redirect them to the new ones. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
ef_search
is a parameter consumed at query time which determines how many edges of the HNSW graph are traversed to find approximate nearest neighbors. Setting this hyperparameter allows HNSW to trade recall for speed at query time.The way this was implemented by HNSWlib had some bad design decisions. It was a property of each index, changed by calling
set_ef
on the index; besides being cumbersome, it also creates a data race in concurrent execution where multiple threads want to execute queries with different ef_search.Additionally, it was not written out with the index, and when the index was loaded again, it was set to
10
, creating an unncessary footgun.This PR fixes all of the above. It:
changes the index parameter name to
ef_search_default_
to make it clear what this is for, and renames the function signatures setting it accordingly.adds a shared mutex which is locked when writing, but allows parallel reading.
stores the default when the index is written out, and reads it when it's loaded
adds an argument to the query path allowing each query to set it independently - this is passed by value so it's on the stack of the function call, rather than on the heap. if it's not passed, we read the index default
updates all tests and examples
Making this work required us to go up to C++ 17 and change the mac OS target to be 10.12 or later. These are both ancient, and we compile this for our users anyway.