Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stream attribute reporter values to avoid OOM errors. #107

Merged
merged 1 commit into from
Oct 30, 2024

Conversation

Foxcapades
Copy link
Member

Problem

OOM errors for building giant maps of attribute values by primary key, in memory, where there is no current use case for lookups.

When the reproduceable OOM happens, the attribute value map is holding ~915M of memory across ~2.6 million k/v pairs. The individual values in this case are comma separated lists of IDs.

Example Value:

PF00630, PF18199, PF18198, PF17857, PF17852, PF13964, PF12781, PF12780, PF12777, PF12775, PF12774, PF08393, PF07646, PF03028, PF01833, PF01344

Proposed Solution

Stream the attribute value query results instead of collecting them into an intermediary map.

Additional Notes

Going by code searches performed by Steve, Dave, and I, it appears that the AbstractAttributeReporter#getAttributeValue method is only called by WordCloudAttributeReporter, and that reporter disregards the keys and simply iterates through the values.

@Foxcapades Foxcapades added the bug Something isn't working label Oct 29, 2024
@Foxcapades Foxcapades self-assigned this Oct 29, 2024
pkValues.put(pkColumn, resultSet.getObject(pkColumn));

return Optional.of(new Tuples.TwoTuple<>(
mapException(() -> new PrimaryKeyValue(pkDef, pkValues), WdkRuntimeException::new),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line can be shortened to wrapException(() -> new PrimaryKeyValue(pkDef, pkValues))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would throw RuntimeException instead of WdkRuntimeException.

@Foxcapades Foxcapades merged commit 29ce5da into master Oct 30, 2024
1 check passed
@Foxcapades Foxcapades deleted the stream-word-cloud branch October 30, 2024 17:27
Foxcapades added a commit that referenced this pull request Nov 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants