You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After some digging and experimentation, turns out this is a thread-safety issue with LRU Segment cache implementation (also applies to the Basic cache, granted that there isn't much that can go wrong there with the current implementation).
As it turns out the LRU cache never had proper testing.
A simple test can reproduce the issue:
importthreadingimportnumpyasnpimportpytestimportuuidfromtypingimportDictfromchromadb.segment.impl.manager.cache.cacheimportSegmentLRUCachefromchromadb.typesimportSegment, SegmentScopedefnew_segment(collection_id: uuid.UUID) ->Segment:
returnSegment(
id=uuid.uuid4(),
type="test",
scope=SegmentScope.VECTOR,
collection=collection_id,
metadata=None,
file_paths={},
)
@pytest.fixturedefcache_setup():
def_get_segment_disk_size(key: uuid.UUID) ->int:
returnnp.random.randint(1, 100)
defcallback_cache_evict(segment: Segment) ->None:
passclassSetupData:
def__init__(self):
self.cache=SegmentLRUCache(
capacity=1000,
callback=lambdak, v: callback_cache_evict(v),
size_func=lambdak: _get_segment_disk_size(k),
)
self.iterations=10000self.num_threads=50self.errors: Dict[str, int] = {
"concurrency_error": 0
}
self.lock=threading.Lock()
returnSetupData()
deftest_thread_safety(cache_setup):
"""Test that demonstrates thread safety issues in the LRU cache"""defworker():
"""Worker that performs multiple cache operations"""iter=0try:
whilecache_setup.errors["concurrency_error"] <=0:
iter+=1key=uuid.uuid4()
segment=new_segment(key)
cache_setup.cache.set(key, segment)
ifiter>=cache_setup.iterations:
print(f"Stopping due to max iterations: {iter} reached")
breakexceptExceptionase:
if"dictionary changed size during iteration"instr(e):
withcache_setup.lock:
cache_setup.errors["concurrency_error"] +=1# Create and start threadsthreads= []
for_inrange(cache_setup.num_threads):
t=threading.Thread(target=worker)
threads.append(t)
t.start()
# Wait for all threads to completefortinthreads:
t.join()
# Assert that we found thread safety issuesassertcache_setup.errors["concurrency_error"] >0, "Expected to find thread safety issues but none were detected"
Versions
Chroma 0.4.23+, any python version and any OS
Relevant log output
No response
The text was updated successfully, but these errors were encountered:
What happened?
A user on discord brought up the following error when discussing memory management strategies and the use of the LRU cache:
Ref: https://discord.com/channels/1073293645303795742/1318044900134092920/1318478281468805152
After some digging and experimentation, turns out this is a thread-safety issue with LRU Segment cache implementation (also applies to the Basic cache, granted that there isn't much that can go wrong there with the current implementation).
As it turns out the LRU cache never had proper testing.
A simple test can reproduce the issue:
Versions
Chroma 0.4.23+, any python version and any OS
Relevant log output
No response
The text was updated successfully, but these errors were encountered: