-
Notifications
You must be signed in to change notification settings - Fork 503
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gcp/saver: Only return errors.KindAlreadyExists if all three exist #1957
Conversation
@dwbuiten thanks for your contribution to Athens! I'm doing a release today, but I'll get to reviewing this and we'll see if we can get it into one of the fix releases. Just off the bat, one thing that'd help show what this is fixing is if you're able to write tests to it in E2E or unit tests. |
@matt0x6F Sure - adding a unit test to |
b637dc4
to
2da5d26
Compare
In gomods#1124, a GCP lock type was added as a singleflight backend. As part of this work, the GCP backend's Save() was made serial, likely because moduploader.Upload requires a call to Exists() before it, rendering the GCP lock less useful, by doubling the calls to GCS. However, by doing this, the existence check was now only checking the existence of the mod file, and not the info or zip. This meant that if during a Save, the zip or info uploads failed, on subsequent rquests, that when using the GCP singleflight backend, Athens would assume everything had been stashed and saved properly, and then fail to serve up the info or zip that had failed upload, meaning the cache was in an unhealable broklen state, requiring a manual intervention. To fix this, without breaking the singleflight behavior, introduce a metadata key that is set on the mod file during its initial upload, indicating that a Stash is still in progress on subsequent files, which gets removed once all three files are uploaded successfully, which can be checked if it it is determined that the mod file already exists. That way we can return a errors.KindAlreadyExists if a Stash is in progress, but also properly return it when a Stash is *not* currently in progress if and only if all three files exist on GCS, which prevents the cache from becoming permanently poisoned. One note is that it is possible the GCS call to remove the metadata key fails, which would mean it is left on the mod object forever. To avoid this, consider it stale after 2 minutes. Signed-off-by: Derek Buitenhuis <[email protected]>
This is the proper way to make sure we do not write a partial file. Signed-off-by: Derek Buitenhuis <[email protected]>
Signed-off-by: Derek Buitenhuis <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I have an outstanding question regarding user configurability, but overall I think the PR looks great. I'm also glad we wrote some tests out. It makes me wonder if the GCP Storage ever reliably worked without these changes.
2b75213
to
839eda3
Compare
This is useful if, for example, the user is on a slow network. Signed-off-by: Derek Buitenhuis <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Thanks for adding the configuration. We'll get this out in our next feature release.
[![Mend Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com) This PR contains the following updates: | Package | Update | Change | |---|---|---| | [gomods/athens](https://togithub.com/gomods/athens) | patch | `v0.14.0` -> `v0.14.1` | --- ### Release Notes <details> <summary>gomods/athens (gomods/athens)</summary> ### [`v0.14.1`](https://togithub.com/gomods/athens/releases/tag/v0.14.1) [Compare Source](https://togithub.com/gomods/athens/compare/v0.14.0...v0.14.1) #### What's Changed - CI dependency updates in [https://github.com/gomods/athens/pull/1962](https://togithub.com/gomods/athens/pull/1962) - Fix for AWS default credentials by [@​yongzhang](https://togithub.com/yongzhang) in [https://github.com/gomods/athens/pull/1963](https://togithub.com/gomods/athens/pull/1963) - Fix for E2E test runs by [@​joeymhills](https://togithub.com/joeymhills) in [https://github.com/gomods/athens/pull/1966](https://togithub.com/gomods/athens/pull/1966) - Feature gcp/saver: Only return errors.KindAlreadyExists if all three exist by [@​dwbuiten](https://togithub.com/dwbuiten) in [https://github.com/gomods/athens/pull/1957](https://togithub.com/gomods/athens/pull/1957) - Fix to set correct content type and send only once by [@​matt0x6F](https://togithub.com/matt0x6F) in [https://github.com/gomods/athens/pull/1965](https://togithub.com/gomods/athens/pull/1965) #### New Contributors - [@​yongzhang](https://togithub.com/yongzhang) made their first contribution in [https://github.com/gomods/athens/pull/1963](https://togithub.com/gomods/athens/pull/1963) - [@​joeymhills](https://togithub.com/joeymhills) made their first contribution in [https://github.com/gomods/athens/pull/1966](https://togithub.com/gomods/athens/pull/1966) - [@​dwbuiten](https://togithub.com/dwbuiten) made their first contribution in [https://github.com/gomods/athens/pull/1957](https://togithub.com/gomods/athens/pull/1957) **Full Changelog**: gomods/athens@v0.14.0...v0.14.1 </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Mend Renovate](https://www.mend.io/free-developer-tools/renovate/). View repository job log [here](https://developer.mend.io/github/gomods/athens-charts). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy4zNzcuOCIsInVwZGF0ZWRJblZlciI6IjM3LjM3Ny44IiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6W119--> Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
What is the problem I am trying to address?
In #1124, a GCP lock type was added as a singleflight backend. As part of this work, the GCP storage backend's
Save()
was made serial, likely becausemoduploader.Upload
requires a call toExists()
before it, rendering the GCP lock less useful, by doubling the calls to GCS.However, by doing this, the existence check was now only checking the existence of the mod file, and not the info or zip. This meant that if during a
Save()
, the zip or info uploads failed, on subsequent rquests, that when using the GCP singleflight backend, Athens woul assume everything had been stashed and saved properly, and then fail to serve up the info or zip that had failed upload, meaning the cache was in an unhealable broklen state, requiring a manual intervention.How is the fix applied?
To fix this, without breaking the singleflight behavior, introduce a metadata key that is set on the mod file during its initial upload, indicating that a
Stash
is still in progress on subsequent files, which gets removed once all three files are uploaded successfully, which can be checked if it it is determined that the mod file already exists. That way we can return aerrors.KindAlreadyExists
if aStash
is in progress, but also properly return it when aStash
is not currently in progress if and only if all three files exist on GCS, which prevents the cache from becoming permanently poisoned.One note is that it is possible the GCS call to remove the metadata key fails, which would mean it is left on the mod object forever. To avoid this, consider it stale after 2 minutes.
What GitHub issue(s) does this PR fix or close?
N/A
Extra Notes
This is my first time touching the Athens codebase, so apologies for any faux pas. We hit this at
$dayjob
, so thought I would take a stab at fixing it.It is a more complex than I would like, so comments very welcome.
This also contains a small fix I found when writing the test: if
io.Copy
failed during an upload to GCS, it would callClose()
on the writer, which would leave a partial/broken file. Thestorage
docs say to cancel the context to prevent this.