fix: Reduce memory usage when publishing prediction log to kafka #525
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
We've seen gradual increase of memory usage when model observability is enabled for a model.
First hypothesis why this happened is due to using asyncio, because we pass prediction input and output to the async function. We try to prove it by reducing sampling rate to 0, since the async function that being called need to publish to kafka, so we need to isolate this only for asyncio overhead. After set sampling rate to 0 the memory usage is stable there is no gradual increase
Since first hypothesis is not correct, we have new hypothesis that this is due to publishing the data to kafka, and we did memory profiler to the model
PS: We use memray as profiler https://github.com/bloomberg/memray
We see that the memory usage is keep increasing and producing the message to kafka contribute to this.
Modifications
To solve this problem, kafka producer must call
poll
after publish the message, this is necessary soack
buffer from producer will be drained and the memory usage won't gradually increase, ref: 1 , 2After the changes
Tests
Checklist
Release Notes