Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: enable http timeout #1777

Merged
merged 4 commits into from
Apr 4, 2024
Merged

fix: enable http timeout #1777

merged 4 commits into from
Apr 4, 2024

Conversation

willmurphyscode
Copy link
Contributor

Otherwise grype db commands can hang if the CDN is having issues.

Related to #1731

Some notes

We don't want commands like grype db update to hang indefinitely, and we certain don't want grype <some-source> to hang indefinitely.

Grype is already intended to continue on error if db updates are not available: https://github.com/anchore/grype/blob/fix%2Foverall-http-timeout/grype/db/curator.go#L142, so after this change, a timeout error when checking for update or downloading the database will result in a logged warning.

30 seconds overall timeout seems be reasonable. The compressed grype db as of this writing is ~152MB .tar.gz file when downloaded. I've done a few time curl -o /dev/null "${DB_URL}" for grype dbs published on different days, and the highest I've seen is about 8 seconds.

Questions

  1. Do we want to expose the timeout as a config variable?
  2. Do we want to use separate timeouts for downloading listing.json (small JSON doc) vs database (much larger tar.gz), maybe making a shorter timeout for the listing.json and a longer timeout for the database?

Manual testing done:

With this change:

( run a server on localhost:5000 that accepts connections but never replies)
$ GRYPE_DB_UPDATE_URL=http://localhost:5000/listing.json go run cmd/grype/main.go alpine:latest
[0000]  INFO grype version: [not provided]
[0060]  WARN unable to check for vulnerability database update << prints warning about failed update, but continues
[0060]  INFO found 12 vulnerability matches across 15 packages
NAME           INSTALLED   FIXED-IN  TYPE  VULNERABILITY   SEVERITY
busybox        1.36.1-r15            apk   CVE-2023-42366  Medium
busybox        1.36.1-r15            apk   CVE-2023-42365  Medium
busybox        1.36.1-r15            apk   CVE-2023-42364  Medium
... snip ...

Otherwise grype db commands can hang if the CDN is having issues.

Signed-off-by: Will Murphy <[email protected]>
kzantow
kzantow previously approved these changes Apr 1, 2024
@kzantow kzantow dismissed their stale review April 1, 2024 21:26

From an offline discussion, we probably want have this configurable

@willmurphyscode
Copy link
Contributor Author

I think there's some legitimate concern that 30 seconds is not always long enough to download the 150 mb grype-db, and the timeouts should definitely be configurable. I'll push an update to:

  1. Put the default timeout for fetching the listing file to be much lower than for fetching the database, since it's a much smaller file, and it gets fetched every time (to see whether there's a newer database available)
  2. Expose both as config and environment variables, so that users can make their own decision on how long it's reasonable to wait for the download.

The particular complaint in #1731 is that the download starts promptly but is very slow. This symptom means that we can't use a dial timeout or a response headers timeout, since the delay is in actually sending the bytes of the response body. However, it might be worth also trying to tune these finer grained timeouts a bit, with the goal that if the network or the database or listing file downloads are just unreachable, grype would block for as little time as possible.

return
case err := <-errs:
t.Error(err)
case <-timeout:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're doing a test that combines the calls of IsUpdateAvailable and UpdateTo is there a chance this can race incorrectly?

I guess the gap is so large from the configured timeouts and failAfter (more than 10x) that it doesn't matter.

LGTM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@spiffcs I wanted to make sure I understand what you mean by "race incorrectly." I think you're saying that since (1) the test fails on a timeout and (2) the assertion expects a different timeout, we could get the wrong timeout. Is that correct?

I think that since we're effectively racing a 0.8 second timer against a 10 second timer, the test should be pretty stable. I put in the 10 second timer because otherwise removing one of the timeouts causes the test suite to hang for 10 minutes while the overall testing timer runs out, which seems like worse behavior.

Please let me know if I've addressed these concerns adequately and then I'll merge.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep! That's exactly what I considered (0.8 vs 10). Test looks good and I don't think it will give us trouble in the future. LGTM

@willmurphyscode willmurphyscode merged commit 7c849c3 into main Apr 4, 2024
10 checks passed
@willmurphyscode willmurphyscode deleted the fix/overall-http-timeout branch April 4, 2024 14:54
@spiffcs spiffcs added the enhancement New feature or request label Apr 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants