Skip to content

Commit

Permalink
docs: Clarify H2 database caching strategy (#6220)
Browse files Browse the repository at this point in the history
  • Loading branch information
chadlwilson authored Dec 6, 2023
1 parent ccd753d commit f5bd492
Show file tree
Hide file tree
Showing 5 changed files with 42 additions and 24 deletions.
2 changes: 1 addition & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -203,7 +203,7 @@ See the full listing of [changes](https://github.com/jeremylong/DependencyCheck/
- The `gradle` and `maven` plugins now have the capability to scan the build plugins ([#4035](https://github.com/jeremylong/DependencyCheck/issues/4035)).
- The `gradle` and `maven` plugins, for transitive dependencies, will report the root dependency in the project that included the transitive dependency ([#5001](https://github.com/jeremylong/DependencyCheck/pull/5001)).
- Added `properties.security-severity` to SARIF report for better integration with GitHub Security Code scanning ([#5277](https://github.com/jeremylong/DependencyCheck/pull/5227)).
- Allow for HTTP auth settings for Retire JS respository ([#5209](https://github.com/jeremylong/DependencyCheck/pull/5209)).
- Allow for HTTP auth settings for Retire JS repository ([#5209](https://github.com/jeremylong/DependencyCheck/pull/5209)).
- New schema for the XML report was added to support some of the above additions ([#5296](https://github.com/jeremylong/DependencyCheck/pull/5296)).
- Added missing gradle option to only warn on remote errors from the OSS Index Analyzer ([gradle #303](https://github.com/dependency-check/dependency-check-gradle/pull/303)).

Expand Down
6 changes: 3 additions & 3 deletions cli/src/main/java/org/owasp/dependencycheck/CliParser.java
Original file line number Diff line number Diff line change
Expand Up @@ -413,11 +413,11 @@ private void addAdvancedOptions(final Options options) {
.addOption(newOption(ARGUMENT.RETIRE_JS_FORCEUPDATE, "Force the RetireJS Analyzer to update "
+ "even if autoupdate is disabled"))
.addOption(newOptionWithArg(ARGUMENT.RETIREJS_URL, "url",
"The Retire JS Respository URL"))
"The Retire JS Repository URL"))
.addOption(newOptionWithArg(ARGUMENT.RETIREJS_URL_USER, "username",
"The password to authenticate to Retire JS Respository URL"))
"The password to authenticate to Retire JS Repository URL"))
.addOption(newOptionWithArg(ARGUMENT.RETIREJS_URL_PASSWORD, "password",
"The password to authenticate to Retire JS Respository URL"))
"The password to authenticate to Retire JS Repository URL"))
.addOption(newOption(ARGUMENT.RETIREJS_FILTER_NON_VULNERABLE, "Specifies that the Retire JS "
+ "Analyzer should filter out non-vulnerable JS files from the report."))
.addOption(newOptionWithArg(ARGUMENT.ARTIFACTORY_PARALLEL_ANALYSIS, "true/false",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -216,7 +216,7 @@ protected void prepareFileTypeAnalyzer(Engine engine) throws InitializationExcep
try {
ds.update(engine);
} catch (UpdateException ex) {
throw new InitializationException("Unable to initialize the Retire JS respository", ex);
throw new InitializationException("Unable to initialize the Retire JS repository", ex);
}
}

Expand Down
54 changes: 36 additions & 18 deletions src/site/markdown/data/cacheh2.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,31 +2,49 @@ Caching ODC's H2 Database
=========================================

Many users of dependency-check ensure that ODC runs as fast as possible by caching
the `data` directory (or in some cases just the H2 database). Where the `data`
directory exists is different for each integration (cli, maven, gradle, etc.).
However, each integration allows users to configure the location of the data directory.
the entire `data` directory included the H2 database (`odc.mv.db`). The location of the `data`
directory is different for each integration (cli, maven, gradle, etc.), however each
allows users to configure this location.

Within the data directory there is a cache directory that contains temporary caches
Within the `data` directory there is a `cache` directory that contains temporary caches
of data requested that is not stored in the database and is generally build specific
- but can be re-used. There are two primary stratigies used:
- but can be re-used.

1. Cache the H2 database
There are two primary strategies used:

1. Single node database updater with multiple node "readers"

Use a single node to build the database using the integration in update only mode
(e.g., `--updateOnly` for the cli) and specify the data directory location (see
the configuration documentation for each integrgations configuration). The data
directory is then archived. Subsequent nodes that perform scanning will then
download the archived database and configure the scan to occur and in general,
the node would be configured with `--noupdate` (or the releated configuration to
disable the updates in each configuration). The database is generally updated daily
in this use case - but could be designed with a more frequent update.
the configuration documentation for each integration's configuration).

The `data` directory is then archived somewhere accessible to all nodes.
Subsequent nodes that perform scanning will download the archived database before
scanning. These "reader" nodes would be configured with `--noupdate` (or the related
configuration to disable the updates in each integration) so they are not reliant
on outgoing calls.

The cached `data` directory (and H2 database) is generally updated by the single
node/process daily in this use case - but could be designed with a more frequent update.

2. Cache the H2 database and the cache
2. Multiple node database updaters collaborating on a common cache location

Some users have a slightly modified version of the above caching strategy. Instead
of only having a single update node - they allow all nodes to update. However,
the data directory is zipped and stored in an common location. Each node will execute
a scan (with updates enabled) and if succesful the updated data directory is zipped
and uploaded to the common location. This has the small advantage of being updated
faster and will store the cache between executions which can improve the performance
on some builds.
the entire `data` directory is zipped and stored in a common location, including the H2
database, `cache`, and in some cases cached data from multiple upstream sources.

Each node will execute a scan (with updates enabled) and if successful the updated
`data` directory is zipped and uploaded to the common location for use by other nodes.
This has the small advantage of being updated faster and will store the cache between
executions which can improve the performance on some builds, with the disadvantage of
needing to allow all nodes to update the common cache, and thus requiring some degree of
consistency in how they configure ODC.

Additional Notes
----------------

The `data` directory may also contain cached data from other upstream sources, dependent
on which analyzers are enabled. Ensuring that file modification times are retained during
archiving and un-archiving will make these safe to cache, which is especially important in
a multi-node update strategy.
2 changes: 1 addition & 1 deletion src/site/markdown/data/mirrornvd.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ Then configure dependency-check to use the NVD Datafeed URL.

Mirroring Retire JS Repository
------------------------------------------------------------
The Retire JS Respository is located at:
The Retire JS Repository is located at:

```
https://raw.githubusercontent.com/Retirejs/retire.js/master/repository/jsrepository.json
Expand Down

0 comments on commit f5bd492

Please sign in to comment.