fix: inventory canary fails because state is too large #1362

rix0rrr · 2023-11-22T13:56:56Z

The Inventory Canary (a Lambda function that calculates how many packages, docsets, etc we have) has been failing for a long while.

The reason is that it accumulates data into an intermediary JSON object that gets saved to S3 periodically when the Lambda is reaching its 15 minute timeout, then reloaded into the next Lambda instance. However, after a certain point the JSON payload exceeds 512MB (which is the maximum string size that V8 will serialize), and the Lambda fails.

Ultimately, we use this information for 2 purposes:

Emit metrics about total counts of packges, submodules, docsets for each.
Write detailed reports about missing documentation sets and corrupted assemblies etc.

The first one is used on the dashboard (which has not been showing data for a while); the second one we ignore and never look at.

Solve this problem by replacing the counters with a HyperLogLog structure. This is now no longer an accurate counter -- it is allowed to have a 1% deviation from the actual number we are looking for. In return, its size is constant instead of ever-growing. We now no longer run the risk of growing our state object too large to serialize.

This drops support for most of the "failed packages" reports we used to create; we only collect a list of uninstallable packages. Otherwise, the other reports we used to create (missing documentation and corrupt assemblies per package version), are no longer created.

We can add those back if we ever see the need, and collect only those packages, instead of all data on all packages in the catalog in triplicate.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

The Inventory Canary (a Lambda function that calculates how many packages, docsets, etc we have) has been failing for a long while. The reason is that it accumulates data into an intermediary JSON object that gets saved to S3 periodically when the Lambda is reaching its 15 minute timeout, then reloaded into the next Lambda instance. However, after a certain point the JSON payload exceeds 512MB (which is the maximum string size that V8 will serialize), and the Lambda fails. Ultimately, we use this information for 2 purposes: - Emit metrics about total counts of packges, submodules, docsets for each. - Write detailed reports about missing documentation sets and corrupted assemblies etc. The first one is used on the dashboard (which has not been showing data for a while); the second one we ignore and never loop at. Solve this problem by replacing the counters with a [HyperLogLog](https://en.wikipedia.org/wiki/HyperLogLog) structure. This is now no longer an accurate counter -- it is allowed to have a 1% deviation from the actual number we are looking for. In return, its size is constant instead of ever-growing. We now no longer run the risk of growing our state object too large to serialize. This drops support for most of the "failed packages" reports we used to create; we only collect a list of uninstallable packages. Otherwise, the other reports we used to create (missing documentation and corrupt assemblies per package version), are no longer created. We can add those back if we ever see the need, and collect only those packages, instead of all data on all packages in the catalog in triplicate.

madeline-k

Solve this problem by replacing the counters with a HyperLogLog structure. This is now no longer an accurate counter -- it is allowed to have a 1% deviation from the actual number we are looking for. In return, its size is constant instead of ever-growing. We now no longer run the risk of growing our state object too large to serialize.

This is very cool! TIL!

Approving, since the code looks good to me. Thanks for putting in plenty of comments explaining! (However, the build is failing.)

Signed-off-by: github-actions <[email protected]>

rix0rrr requested a review from a team November 22, 2023 13:56

cdklabs-automation enabled auto-merge November 22, 2023 13:57

rix0rrr changed the title ~~fix: inventory canary's state is too large~~ fix: inventory canary fails because state is too large Nov 22, 2023

madeline-k approved these changes Nov 22, 2023

View reviewed changes

rix0rrr and others added 2 commits November 23, 2023 14:38

Thorough refactoring in order to be able to test better

9d55e7d

chore: self mutation

8f74522

Signed-off-by: github-actions <[email protected]>

cdklabs-automation added this pull request to the merge queue Nov 23, 2023

Merged via the queue into main with commit 8c31ff9 Nov 23, 2023
6 checks passed

cdklabs-automation deleted the huijbers/inventory-canary branch November 23, 2023 13:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: inventory canary fails because state is too large #1362

fix: inventory canary fails because state is too large #1362

rix0rrr commented Nov 22, 2023 •

edited

Loading

madeline-k left a comment •

edited

Loading

fix: inventory canary fails because state is too large #1362

fix: inventory canary fails because state is too large #1362

Conversation

rix0rrr commented Nov 22, 2023 • edited Loading

madeline-k left a comment • edited Loading

Choose a reason for hiding this comment

rix0rrr commented Nov 22, 2023 •

edited

Loading

madeline-k left a comment •

edited

Loading