Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Externalize BOM ingestion pipeline #794

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from
Draft

Externalize BOM ingestion pipeline #794

wants to merge 5 commits into from

Conversation

nscuro
Copy link
Member

@nscuro nscuro commented Jul 22, 2024

Description

This PR moves the processing of uploaded BOMs from the internal, in-memory eventing system to Kafka.

This allows BOM processing to be distributed to multiple instances of the API server. Before this PR, instances that received and upload request were also the ones processing the BOM, which could lead to very uneven load distributions.

To allow for processing to be shared, uploaded BOMs need to be stored in a location that all instances can access. At the moment, BOMs are written to the /tmp directory which clearly doesn't work for the goal at hand.

This PR provides three new storage extensions for this purpose:

  1. Local filesystem (could also be NAS)
  2. Database (new BOM_UPLOAD table)
  3. S3

The default being Database, since it requires no additional setup. Database storage is expected to be not a good fit for large deployments with very frequent uploads.

Note

With the work done in PR #805, the idea is to empower users to plug in their own, potentially proprietary, storage solutions.

To reduce the volume of data being transmitted to and from storage, as well as reduce the storage size requirements, we compress BOMs using zstd. Currently, the compression level is hardcoded to 3 (22 is the maximum), but the plan is to make this configurable.

BOMs are stored after successful validation, and deleted again after successful processing. The storage being used is thus only temporary and will not replace the eventual adoption of the CycloneDX Transparency Exchange API.

Per default, each instanc processes at most alpine.kafka.processor.bom.upload.max.concurrency=-1 BOMs in parallel (-1 meaning match number of topic partitions). Because the concurrency is key-based, it can be increased beyond the number of partitions. The maximum parallelism is bound by the number of unique projects BOMs are uploaded to.

Addressed Issue

Closes DependencyTrack/hyades#633

Additional Details

sequenceDiagram
    Client->>+API Server: Upload BOM
    API Server->>API Server: Validate BOM
    API Server->>API Server: Generate correlation token (UUID)
    API Server->>API Server: Compress BOM (zstd)
    API Server->>Storage: Upload compressed BOM
    Note over API Server, Storage: Keyed by correlation token
    API Server->>Kafka: Publish event to dtrack.event.bom-uploaded topic
    Note over API Server, Kafka: Key=Project UUID<br/>Value=org.dependencytrack.event.v1alpha1.BomUploadedEvent proto
    API Server->>Client: Return correlation token
    loop continuously
        API Server->>Kafka: Consume from dtrack.event.bom-uploaded topic
        loop for each event
            API Server->>Storage: Get compressed BOM by correlation token
            API Server->>API Server: Decompress BOM
            API Server->>API Server: Process BOM
            alt processing failed
                API Server->>API Server: Update status of upload in DB to "failed"
                API Server->>Kafka: Publish event to "BOM Processing failed" topic
            else processing succeeded
                API Server->>Storage: Delete BOM by correlation token
                API Server->>API Server: Update status of upload in DB to "successful"
                API Server->>Kafka: Publish event to "BOM Processed" topic
                API Server->>API Server: Trigger vuln analysis etc.
            end
        end
    end
Loading

Checklist

  • I have read and understand the contributing guidelines
  • This PR fixes a defect, and I have provided tests to verify that the fix is effective
  • This PR implements an enhancement, and I have provided tests to verify that it works as intended
  • This PR introduces changes to the database model, and I have updated the migration changelog accordingly
  • This PR introduces new or alters existing behavior, and I have updated the documentation accordingly

@nscuro nscuro added the enhancement New feature or request label Jul 22, 2024
@nscuro nscuro force-pushed the issue-633 branch 6 times, most recently from 1bfd177 to 81206ad Compare July 24, 2024 20:28
Copy link

codacy-production bot commented Jul 24, 2024

Coverage summary from Codacy

See diff coverage on Codacy

Coverage variation Diff coverage
+0.00% (target: -1.00%) 84.93% (target: 70.00%)
Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (8b43f77) 21700 17882 82.41%
Head commit (caef8f5) 21895 (+195) 18042 (+160) 82.40% (+0.00%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#794) 272 231 84.93%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

See your quality gate settings    Change summary preferences

Codacy stopped sending the deprecated coverage status on June 5th, 2024. Learn more

@nscuro nscuro added this to the 5.6.0 milestone Jul 24, 2024
@nscuro nscuro force-pushed the issue-633 branch 10 times, most recently from 2577957 to 111bb27 Compare July 27, 2024 19:13
@nscuro nscuro force-pushed the issue-633 branch 9 times, most recently from 14b0484 to c65b55d Compare July 29, 2024 22:04
@nscuro nscuro force-pushed the issue-633 branch 3 times, most recently from 0b7c377 to 5babe60 Compare August 5, 2024 11:01
@hoggmania
Copy link

Look's good to go to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Externalize BOM ingestion pipeline
2 participants