Stream attribute reporter values to avoid OOM errors. #107
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
OOM errors for building giant maps of attribute values by primary key, in memory, where there is no current use case for lookups.
When the reproduceable OOM happens, the attribute value map is holding ~915M of memory across ~2.6 million k/v pairs. The individual values in this case are comma separated lists of IDs.
Example Value:
Proposed Solution
Stream the attribute value query results instead of collecting them into an intermediary map.
Additional Notes
Going by code searches performed by Steve, Dave, and I, it appears that the
AbstractAttributeReporter#getAttributeValue
method is only called byWordCloudAttributeReporter
, and that reporter disregards the keys and simply iterates through the values.