-
-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Externalize BOM ingestion pipeline #794
base: main
Are you sure you want to change the base?
Conversation
1bfd177
to
81206ad
Compare
Coverage summary from CodacySee diff coverage on Codacy
Coverage variation details
Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: Diff coverage details
Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: See your quality gate settings Change summary preferencesCodacy stopped sending the deprecated coverage status on June 5th, 2024. Learn more |
2577957
to
111bb27
Compare
14b0484
to
c65b55d
Compare
0b7c377
to
5babe60
Compare
Look's good to go to me. |
11d073d
to
04d9efc
Compare
4566ad2
to
caef8f5
Compare
Signed-off-by: nscuro <[email protected]>
Signed-off-by: nscuro <[email protected]>
Signed-off-by: nscuro <[email protected]>
Signed-off-by: nscuro <[email protected]>
Signed-off-by: nscuro <[email protected]>
Description
This PR moves the processing of uploaded BOMs from the internal, in-memory eventing system to Kafka.
This allows BOM processing to be distributed to multiple instances of the API server. Before this PR, instances that received and upload request were also the ones processing the BOM, which could lead to very uneven load distributions.
To allow for processing to be shared, uploaded BOMs need to be stored in a location that all instances can access. At the moment, BOMs are written to the
/tmp
directory which clearly doesn't work for the goal at hand.This PR provides three new storage extensions for this purpose:
BOM_UPLOAD
table)The default being Database, since it requires no additional setup. Database storage is expected to be not a good fit for large deployments with very frequent uploads.
Note
With the work done in PR #805, the idea is to empower users to plug in their own, potentially proprietary, storage solutions.
To reduce the volume of data being transmitted to and from storage, as well as reduce the storage size requirements, we compress BOMs using zstd. Currently, the compression level is hardcoded to
3
(22
is the maximum), but the plan is to make this configurable.BOMs are stored after successful validation, and deleted again after successful processing. The storage being used is thus only temporary and will not replace the eventual adoption of the CycloneDX Transparency Exchange API.
Per default, each instanc processes at most
alpine.kafka.processor.bom.upload.max.concurrency=-1
BOMs in parallel (-1
meaning match number of topic partitions). Because the concurrency is key-based, it can be increased beyond the number of partitions. The maximum parallelism is bound by the number of unique projects BOMs are uploaded to.Addressed Issue
Closes DependencyTrack/hyades#633
Additional Details
Checklist
This PR fixes a defect, and I have provided tests to verify that the fix is effectiveThis PR introduces new or alters existing behavior, and I have updated the documentation accordingly