You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When a vector segment is accessed (e.g. include=["embeddings"] or queried) its file handles are opened and is therefore loaded into memory. the loaded vector index is then added to _vector_instances_file_handle_cache of the local segment manager - self._vector_instances_file_handle_cache.set(collection_id, instance). The LRU cache has a callback to evict items from the cache when the capacity overflows. The capacity of the LRU cache is bound to the number of allowed open files NOFILE kernel param. The actual capacity is calculated by the number of items (vector segments) multiplied by the number of opened files per segment - 5. Items from the cache are only evicted upon overflow, there is way to manually evict items.
The latter eviction strategy is not a memory leak in of itself, but it allows when NOFILE is too high, on a regular docker container (tested on MacOS) the limit is 209715, divided by 5 gives us 41 943number of items (vector segments) that will persist in cache and keep reference to the segment thus preventing them from being garbage collected. Looking further into the eviction callback - callback=lambda _, v: v.close_persistent_index() and tracing that across the persistent HNSW just closing the index closes the HNSW file handles thus releasing the memory held by HNSW. Unfortunately that is not enough to free all the memory that the vector segment occupies. This leaves us with the subscription to embedding queue still active and keeping a reference to the HNSW index. While the latter might still not fully qualify as memory leak, the fact that a deletion of a collection that leaves a hanging references in the embedding queue subscriptions and also in the LRU cache (especially when it is too large) is a proper memory leak. As an added bonus everything inside the BF index is also leaked to the memory.
The leak is simple to reproduce by continuously creating new collections, adding some vectors (e.g. 199 just to keep things interesting with HNSW file handles and BF) and then deleting the collection. Observing Chroma memory consumption over a moderate period of them e.g. 1h.
As an added bonus the LRU cache is not thread safe, similar to #3334, which is easily demonstrable with the above reproduction scenario run in parallel and a small NOFILE.
The below screenshots demonstrate the effect of the defect when running Chroma (latest main as of 19-Dec-2024). The the instance limited to 4GB, creating and deleting collections of 199 docs (1536 dims embeddings) and 1 sec of sleep between each cycle took 25mins to run out of memory.
Versions
Chroma 0.4.0+, any python version, any OS
Relevant log output
No response
The text was updated successfully, but these errors were encountered:
What happened?
When a vector segment is accessed (e.g. include=["embeddings"] or queried) its file handles are opened and is therefore loaded into memory. the loaded vector index is then added to _vector_instances_file_handle_cache of the local segment manager - self._vector_instances_file_handle_cache.set(collection_id, instance). The LRU cache has a callback to evict items from the cache when the capacity overflows. The capacity of the LRU cache is bound to the number of allowed open files NOFILE kernel param. The actual capacity is calculated by the number of items (vector segments) multiplied by the number of opened files per segment - 5. Items from the cache are only evicted upon overflow, there is way to manually evict items.
The latter eviction strategy is not a memory leak in of itself, but it allows when NOFILE is too high, on a regular docker container (tested on MacOS) the limit is 209715, divided by 5 gives us 41 943number of items (vector segments) that will persist in cache and keep reference to the segment thus preventing them from being garbage collected. Looking further into the eviction callback - callback=lambda _, v: v.close_persistent_index() and tracing that across the persistent HNSW just closing the index closes the HNSW file handles thus releasing the memory held by HNSW. Unfortunately that is not enough to free all the memory that the vector segment occupies. This leaves us with the subscription to embedding queue still active and keeping a reference to the HNSW index. While the latter might still not fully qualify as memory leak, the fact that a deletion of a collection that leaves a hanging references in the embedding queue subscriptions and also in the LRU cache (especially when it is too large) is a proper memory leak. As an added bonus everything inside the BF index is also leaked to the memory.
The leak is simple to reproduce by continuously creating new collections, adding some vectors (e.g. 199 just to keep things interesting with HNSW file handles and BF) and then deleting the collection. Observing Chroma memory consumption over a moderate period of them e.g. 1h.
As an added bonus the LRU cache is not thread safe, similar to #3334, which is easily demonstrable with the above reproduction scenario run in parallel and a small NOFILE.
The below screenshots demonstrate the effect of the defect when running Chroma (latest main as of 19-Dec-2024). The the instance limited to 4GB, creating and deleting collections of 199 docs (1536 dims embeddings) and 1 sec of sleep between each cycle took 25mins to run out of memory.
Versions
Chroma 0.4.0+, any python version, any OS
Relevant log output
No response
The text was updated successfully, but these errors were encountered: