Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
fix: inventory canary fails because state is too large (#1362)
The Inventory Canary (a Lambda function that calculates how many packages, docsets, etc we have) has been failing for a long while. The reason is that it accumulates data into an intermediary JSON object that gets saved to S3 periodically when the Lambda is reaching its 15 minute timeout, then reloaded into the next Lambda instance. However, after a certain point the JSON payload exceeds 512MB (which is the maximum string size that V8 will serialize), and the Lambda fails. Ultimately, we use this information for 2 purposes: - Emit metrics about total counts of packges, submodules, docsets for each. - Write detailed reports about missing documentation sets and corrupted assemblies etc. The first one is used on the dashboard (which has not been showing data for a while); the second one we ignore and never look at. Solve this problem by replacing the counters with a [HyperLogLog](https://en.wikipedia.org/wiki/HyperLogLog) structure. This is now no longer an accurate counter -- it is allowed to have a 1% deviation from the actual number we are looking for. In return, its size is constant instead of ever-growing. We now no longer run the risk of growing our state object too large to serialize. This drops support for most of the "failed packages" reports we used to create; we only collect a list of uninstallable packages. Otherwise, the other reports we used to create (missing documentation and corrupt assemblies per package version), are no longer created. We can add those back if we ever see the need, and collect only those packages, instead of all data on all packages in the catalog in triplicate. ---- *By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license* --------- Signed-off-by: github-actions <[email protected]> Co-authored-by: github-actions <[email protected]>
- Loading branch information