Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bound the size of cache in deprecation logger #16724

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

andrross
Copy link
Member

@andrross andrross commented Nov 26, 2024

The current implementation of the map used to de-duplicate deprecation log
messages can grow without bound. This adds a simple fixed limit to the data
structure tracking existing loggers. Once the limit is breached new loggers will
not be deduplicated. I also added a check to skip the tracking if the
deprecation logger is disabled.

Related Issues

Resolves #16702

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Member

@dblock dblock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is not using a concurrent hash map a premature optimization? After all even at high volume of requests you’re not going to hit a lock often (concurrent maps are very fast and use atomic swaps as far as I remember).

Copy link
Contributor

❌ Gradle check result for 1913d2b: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@andrross
Copy link
Member Author

Is not using a concurrent hash map a premature optimization?

@dblock I don't think using the OpenSearch Cache is an optimization. It's definitely slower! The code is simple though, which is why I chose it.

What alternative do you have in mind with a concurrent hash map? The choices I see are: (1) stop deduplicating when the map hits a certain capacity, or (2) track insertion order to know which ones to evict. I could maybe be convinced that option 1 is good enough, but option 2 is what I implemented using the existing Cache data structure.

@reta
Copy link
Collaborator

reta commented Nov 27, 2024

The choices I see are: (1) stop deduplicating when the map hits a certain capacity,

I would agree with @dblock here, the Cache does much more than (expiration tracking, etc) we need, plus we actually introduce more boxing operations using hash code (int -> Integer). May be ConcurrentHashMap is not a bad idea:

    public boolean isAlreadyLogged() {
        return keys.size() <  MAX_DEDUPE_CACHE_ENTRIES && keys.putIfAbsent(keyWithXOpaqueId, Boolean.TRUE) == null;
    }

@andrross
Copy link
Member Author

andrross commented Nov 27, 2024

stop deduplicating when the map hits a certain capacity

@reta Agreed that Cache does a lot more than we need (mostly around stats...time-based expiration is a no-op if you don't configure it), but it does give us the LRU semantics which seem beneficial here. The pattern I worry about if we have a simple capacity limit is if you have clients that periodically rotate their opaque ID, then at some point if you leave an OpenSearch server running long enough then it will just stop de-duplicating deprecation warnings.

Also the hashcode thing was to make the memory utilization deterministic as every entry is the same size (a boxed integer). I'm happy to change this to a limited-capacity concurrent set of strings if you think that is good enough. That solution is definitely simple and fast.

@reta
Copy link
Collaborator

reta commented Nov 28, 2024

@reta Agreed that Cache does a lot more than we need (mostly around stats...time-based expiration is a no-op if you don't configure it), but it does give us the LRU semantics which seem beneficial here.

Thanks @andrross , I would have expected us to use LRU semantics but Cache is configured without any expiration policies set (please correct me if I am wrong), so LRU is not used. I also very much aligned with @dblock on the simplicity part, this flow is far from critical and it looks to me that logging the deprecation warning 65k times (at worst, which I believe would very likely be the same deprecated feature triggered over and over again) is sufficient to be noticeable.

@andrross
Copy link
Member Author

I would have expected us to use LRU semantics but Cache is configured without any expiration policies set (please correct me if I am wrong), so LRU is not used

@reta Without expiration policies the Cache will track insertion order, so you're correct it is not LRU, but the basic point is that the Cache will allow new entries to be deduplicated after the limit is reached. However, I can totally buy the argument that this is not a critical code path and it is better to be simple and fast than adding overhead to handle edge cases.

The current implementation of the map used to de-duplicate deprecation log
messages can grow without bound. This adds a simple fixed limit to the data
structure tracking existing loggers. Once the limit is breached new loggers will
not be deduplicated. I also added a check to skip the tracking if the
deprecation logger is disabled.

Signed-off-by: Andrew Ross <[email protected]>
@andrross
Copy link
Member Author

andrross commented Nov 29, 2024

@reta I pushed a simple version that simply enforces a size limit in the map. Let me know what you think.

I also removed the hashcode() change, which now means we don't really have control over the amount of memory used as the size of the keys are controlled by the caller. But in practice I think this map would be at under 10MB unless unreasonably large keys are used.

@reta
Copy link
Collaborator

reta commented Nov 29, 2024

@dblock I think you would like it ;-)

Copy link
Contributor

✅ Gradle check result for 68e9bfc: SUCCESS

Copy link

codecov bot commented Nov 29, 2024

Codecov Report

Attention: Patch coverage is 80.00000% with 2 lines in your changes missing coverage. Please review.

Project coverage is 72.11%. Comparing base (c82cd2e) to head (68e9bfc).
Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
...g/opensearch/common/logging/DeprecatedMessage.java 83.33% 0 Missing and 1 partial ⚠️
...g/opensearch/common/logging/DeprecationLogger.java 75.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #16724      +/-   ##
============================================
- Coverage     72.51%   72.11%   -0.41%     
+ Complexity    65562    65173     -389     
============================================
  Files          5318     5318              
  Lines        303945   303948       +3     
  Branches      43976    43978       +2     
============================================
- Hits         220413   219182    -1231     
- Misses        65798    66803    +1005     
- Partials      17734    17963     +229     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch bug Something isn't working Other
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] DeprecationLogger - Unbounded memory usage
6 participants