-
Notifications
You must be signed in to change notification settings - Fork 502
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Availability vs. correctness in list API #1532
Comments
If storage is down, pretty much nothing will work. Athens always serves from the storage, never directly from
If you're okay with all of the above, I think the only thing we should focus on is issue number For this, Athens chose "consistency" over "availability". With availability > consistency:
With consistency > availability:
Solutions:
For 2, is there a good case for this? Is your VCS down so frequently that you want to just ignore it a lot? For 3, Athens already has a lot of configuration options, and it makes Athens a bit difficult and scary to approach because users see a lot of knobs and are not sure what to do with them. We do our best documenting the config file but it's pretty big and scary to a lot of people I would imagine. @linzhp I'm curious to hear your thoughts on 2 and 3. I don't feel strongly enough either way 👍 |
It sounds like this could be solved by offering a configuration option that allows the operator to determine which they prefer. It will increase code complexity a tad bit, but allow the operator to specify the behavior they prefer. Thoughts? |
@twexler re-iterating my point 3 above:
|
I hesitate towards adding another configuration since our config file is already gigantic. But if people think it's definitely nice to have, then I'm okay with that. Also the original issue makes an assumption that if Storage is down then Athens would continue to work which is not true. Therefore, I'd like to get a second feel on whether the current behavior still makes sense or the configuration is necessary. Thanks ✌️ |
Oops, sorry @marwan-at-work. I was speed reading while having my first coffee of the day and missed that. I can see both sides of the argument from my previous experience (having dealt with VCS outages partially breaking my builds and having internal caches hide breakage from me). I think there may be a reasonable middle ground. I see a few of approaches to realize that middle ground:
|
I realize my original description of "storage down" is very confusing, and I didn't provide our context. My apologizes. We experienced a partially down storage earlier this week. Most of the APIS of the storage worked except the storage.List. So even when Athens got a pretty good list of versions from VCS, users still got 500 errors. Let me rephrase the two scenarios of list API:
Scenarios 1 can be a bit confusing for users who run
Scenarios 2 means users cannot get the latest version using In either scenarios, whether users sees a 500 error or 200, they are not able to use
Bigger companies like Uber, we have centralized developer experience teams to discover and fix issues in dev infra. We can easily monitor and alert these types of events from Athens log. These type of failures are often outside the scope of a developer experience team. While working with other teams to fix those issues, we can keep the business running as much as possible in the meantime with availability > consistency approach.
Dev infra is too intimidating for most users to dig. Instead, they paged us in the midnight, and we had to patch Athens so it ignores storage.List errors... |
Currently Athens is a little hard to use in offline environments (such as one described in the the 'pre-filling disk storage' docs scenario) because there's no way to find out what module versions Athens has available, aside from looking at the actual files in storage. A list API that works without connecting to the upstream VCS would be a great help in that situation. Furthermore, in this case having a storage list API isn't a discussion of availability vs. correctness, because storage is the only truth Athens will ever see. I do agree that this should be a new configuration option to be able to ensure correctness in online environments. |
@praseodym would the |
I think this is a really important issue along with the better offline support in #1506 and #1532 (comment) above. I also think these two issues are related. I don't think an Athens running with default configuration should work if there's a storage outage, even if some of the APIs still work. That behavior would violate the primary and original goal of deterministic builds. I'm always open to being convinced otherwise though 😄. At the moment, we've applied this determinism goal to projects with a complete @marwan-at-work I think we can accomplish the additional configuration with a single new variable in
@linzhp @marwan-at-work @praseodym I'm trying to solve all the problems at once here, which comes with the danger of solving none of them. Tell me what you think? |
I like the idea of |
glad to hear it @linzhp. I'll try to solicit some more opinions and hopefully we can come up with a good solution and build it |
I tweeted a request for people to come comment here |
also note that the Go team settled on a similar behavior for their public proxy as what |
I'm late to this topic and my use case my not help. But I'm currently using Athens in an offline network and it is painful (of course it's better than the alternative). I ended up setting up a ftp server just so people could look for the version that was available so they can manually update their go.mod files. Originally I thought by setting To be fair, I don't know the details behind the magic that makes go mod tidy work. But from a I don't know all the details behind the magic that makes
|
But to be clear, I don't really care what the implementation is as long as |
Thanks @nathanhack! We have two major features to do - external storage and offline mode. We just finished the former so we're going to start tackling this as soon as we can. |
Just checking in on the |
Is your feature request related to a problem? Please describe.
When Athens is serving list API, it issues request to both storage and VCS and merge two lists. It fails if there is a storage error. I understand this is to guarantee that Athens returns tags that are deleted from VCS but cached locally can be returned.
However, this approach compromise the availability of Athens server: either VCS or storage is down, Athens list API will be down too.
Describe the solution you'd like
Specifically to this list API, I think availability is more important than correctness. Given that Go toolchain relies on list api to get the latest version of a module, we can think about two scenarios:
go get -u
orgo get <something>@latest
to get that version, but they can still usego get <something>@<cached tag>
. If the deleted tag is not the latest, no impact.Describe alternatives you've considered
With current Athens implementation, Go commands will break with 500 errors in above scenarios. I agree that neither situation is ideal, but when that happens, instead of having people not able to run
go mod
orgo get
at all, I would prefer limiting the capability of getting the latest version.@marwan-at-work @arschles thoughts?
The text was updated successfully, but these errors were encountered: