Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory performance option: Validate one release at a time #56

Open
jpmckinney opened this issue Oct 6, 2020 · 2 comments
Open

Memory performance option: Validate one release at a time #56

jpmckinney opened this issue Oct 6, 2020 · 2 comments
Labels
performance schema validation Relating to JSON Schema validation

Comments

@jpmckinney
Copy link
Member

Presently, the entire package needs to be loaded into memory to be validated. This of course consumes a lot of memory for larger files. open-contracting/lib-cove-oc4ids#23

An alternative is to read the entire input twice: once to re-build the package metadata without releases/records/etc., and then to iteratively yield each release/record for validation.

To avoid rewriting a lot of code, we could perhaps stitch the results for individual releases/records back together, so that errors are still reported as being about releases/0, releases/1, etc. even though each was validated separately.

In any case, this is the only way for memory usage to not scale with input size.

@jpmckinney
Copy link
Member Author

This would reduce memory but not running time. We don't presently have an issue with memory (except in rare cases when someone uploads a huge file to the DRT).

@jpmckinney jpmckinney closed this as not planned Won't fix, can't repro, duplicate, stale Jul 9, 2023
@jpmckinney
Copy link
Member Author

Re-opening as actually we do have an issue with memory (in Kingfisher Process, if we were to attempt to validate packages rather than individual releases/records open-contracting/kingfisher-process#392).

@jpmckinney jpmckinney reopened this Jul 10, 2023
@jpmckinney jpmckinney changed the title Performance option: Validate one release at a time Memory performance option: Validate one release at a time Jul 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance schema validation Relating to JSON Schema validation
Projects
None yet
Development

No branches or pull requests

1 participant