-
Notifications
You must be signed in to change notification settings - Fork 502
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use cache before running "go list" #1511
Comments
@linzhp that sounds good to me. A couple of implementations I have in mind:
Option 1 is the most straight forward but requires updating our storage interface and all of its implementations. Option 2 sounds the easiest and also logical since list-data is discardable. However, memory management becomes an issue. Option 3 is probably my least favorite although I'd be curious to know if anyone thinks that it has some benefits that are not obvious. |
One downside of this is it won't work in distributed environment. Because memory cache doesn't consult peers, there's a chance that users see different results of the same request. |
That's a good point. Also Option 2 means each node needs to request a copy of the list from VCS, which involves more VCS traffic than Option 1. However, updating the storage interface and all its implementations sounds scary... |
One thing to realize is the following: The "go list" command only runs in two endpoints: In other words, if you have a resolved go.mod file and you run a Those two endpoints are only ever called when you are onboarding a new module to your system. In a CI/CD environment, Therefore, the remaining use case, as far as I can see, is when you're adding a new module during development and in that case you might want to have the most up to date list and not an outdated-but-cached list. So this has been mainly the reason Furthermore, the TTL on go list has to considerably short (nothing more than 5 minutes), because it can be a really annoying experience to introduce a new version of your module, and not have it show up for more than 5 minutes. I do recognize that it's a slow operation and caching it can lead to a nicer experience. But I'd love to hear a compelling reason for why speeding up this endpoint is a significant win |
A bit context: We generate Go code on the fly at build time. When we generate Go packages from the proto files of Apache Mesos , the import path will become With Go 1.13,
A new commit will take quite a while, maybe an hour, to be available from proxy.golang.org though. We can also make this cache feature configurable, so users can turn in off in their Athens instance |
@linzhp thanks for the context. The But to stay on topic, yes I imagine the cache will be configurable and most likely opt-in. What we can do is introduce it as an optional interface, similar to how http.ResponseWriter might also be an http.Flusher. This way, we can check if a storage does implement a ListCacher and if so, we use it. If not, and the user has opted-in, we can fail or warn. |
Another consideration - completely offline environment.
@marwan-at-work I think that even without implementing the ListCacher interface-to-be , these endpoints should return the local catalog of versions instead of an error when the VCS is unavailable, or at least have an optional configuration to do so. |
Sorry everyone, work has been busy. I'll get back to reading and answering as soon as I can. But I just thought of an idea that would also unblock anyone here that needs "offline mode" or "partial responses": You can create a side-car GOPROXY that you set "GoBinaryEnvVars" to point to. The GOPROXY will do 1 of the 2 things:
Now the question becomes: is that solution too much to implement for users? @linzhp this should definitely unblock you from using an Athens fork, but would you find that too much to maintain and prefer a config instead? The implementation for the side-car GOPROXY should be fairly trivial as far as I can tell. |
After upgrading to Go 1.13 and properly handling packages like |
@linzhp are you able now to get onto the mainline Athens branch? |
Not yet. We still have to fork in order to:
|
Got it. For #1450, I'll have another look and see if we can fix it faster. For metrics, if we can help, let me know. For your internal storage API, in the past we've tried to implement a generic storage driver. One try was based on gRPC and the other was an HTTP based implementation. Would either of those help? |
Our internal storage API comes with its own Go client, so we implemented a I will work with @xytan0056 to identify gaps on metrics |
We currently use tally to directly emit metrics data to m3db server. However, the latter doesn't seem to have a compatible exporter. We can probably do some tweaks around our infra though. Need some time to research. |
@linzhp we have had some folks try to build new & more generic backends so they could also run Athens with their specific backend without recompiling Athens. Tons of reading on this if you're interested: #1110, #1131, #1459, #1130, #1353. Somewhere in one of those PRs, we discussed a gRPC based API that Athens can use to performantly talk to a storage server. If that would work for you, we can try to make something like that happen. I know that one of those PRs would also welcome it. |
I think it's a good idea to have some plugin architecture to support custom storage backends without forking Athens. Keep me posted on the progress |
@linzhp will do. We would like some details on what specific functionality you would need to be able to use it at your company. For example, if we did a straightforward HTTP API that matches the Go proxy download API, would that be enough for you? |
@arschles #1131 is actually what we considered previously, however, when calling our internal storage using raw HTTP, we must specify some specific params in header or query param. This involves modifying go code with pieces specific to our company. It's possible to make the header from athens configurable, but I'm not sure how to do that right. (Least is make it customized in toml) |
@xytan0056 what if you used the environment variables to configure s3 credentials? |
Is your feature request related to a problem? Please describe.
When Athens handles "list" or "latest" commands, it always runs "go list -m -versions" to get the list of version (https://github.com/gomods/athens/blob/master/pkg/module/go_vcs_lister.go#L44). The "go list" command in turn will reach out to the VCS to get the list. This is not optimal because:
ls-remote
operations are often slow.Describe the solution you'd like
We can have Athens to cache the list of version for each package. Upon receiving a list request for a specific module, Athen will always look up the version cache first. It will return the list from cache if the list of versions of that module is updated recent enough. Athen only run
go list -m -versions
if there is no version cache for that module, or that cache is older than a configurable age. Getting the list of versions can be configured as an async operations, i.e., it happens after returning the old list from the cache. When a VCS server is down, Athens still serves the reasonably up-to-date list of versions to the client, with logging of the unavailability of VCS.Describe alternatives you've considered
Alternatively, we can populate module cache for all semantic versions of a module that "go list -m -versions" returns, and record the timestamp of the most recent update for the module. Then storage.List() will serve as version cache. The downside is the module cache may store some versions that users never requested.
Additional context
Privately hosted VCS may not be able to handle as much traffic as public hosting sites like Github. In addition, we need protection from any VCS issues.
We may help implementing this feature.
The text was updated successfully, but these errors were encountered: