-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a local cache module #884
Conversation
When the `file_cache` module will be added, the `launcher` module will depend on the `file_cache` module which depends on the client. That introduces a circular dependency since the tests are in the client module itself. This splits the pieces that need the launcher into a separate `test` module that breaks the dependency; it works as the launcher-integrated pieces only depend on the public interfaces of the client.
The file cache provides a simple caching interface. Only supporting full files - and minimal functionality - it's meant to support caching at worker ndoes. This commit is a work-in-progress, lacking tests and still needing integration with token-based authorization.
This adds the `file_cache` as a top-level module in Pelican that can be invoked as part of launchers. It also adds support for the client to accept caches of the form `unix://`.
If we want to keep most of Pelican cancellable, then any network calls need to take a context.
A necessary prereq to validating tokens given an issuer.
A resource-enabled scope is of the form: ``` authz:/resource ``` (example: `storage.read:/foo`). These resources are partially ordered and hierarchical, meaning some scopes "contain" the authorization for others. For example, `storage.read:/foo` implies access to `storage.read:/foo/bar` If a resource is not given, it is assumed to be `/`.
This adds support for authorization to the local cache and periodically synchronizes the local cache's authorization policy with the director.
Instead, provide a ID() method which returns a string. Mirrors what is done for the TransferJob object.
Includes fixes required to make the code work with the tests.
AddScopes now takes a variadic argument instead of a slice; further, added some generics so the same functions can be used for normal or resource-enabled scopes.
Includes corresponding bugfixes.
Includes corresponding bugfixes. The client can now work with the cache!
Also fixes the client's `DoStat` function to work without topology.
This implements an upper limit on the transfer rates; meant mainly to support unit tests.
The large file unit test ensures that partial progress is made; that is, data flows to the client while it is still being downloaded.
Marking as ready for review. Since original draft was posted, this has gained a reasonably test suite and the first round of bugfixes. (Due to OSG Hub outages, it's not possible for unit tests to pass currently -- will continue to clean those up but it should be ready for a first-pass of a review.) |
One thing I noticed when writing a unit test for StatHttp(): Lines 198 to 222 in 459b7a6
In this code snippet, you don't list |
Nice !!! Some quick questions: Looking at the PR, it appears one can also set the size of the cache and a high/low water mark for when to do garbage collection ... is this using LRU or? If this module is activated, how does it impact meta data reported back to the stash_plugin... e.g. are there new/different error codes? can the client report back if the file was pulled from the local cache or not? Also I am reading it correctly that all authorization rules are honored as before, i.e. user X cannot access files limited only to user Y? How is that enforced, i.e. what prevents user X from simply doing a chdir to the directory containing the data cache and reading files? |
Make sure that `DoGet` / `DoCopy` / `DoPut` goes through the transfer results and promote any errors. Also touch up exported and internal functions in the client to follow go naming practices.
Ensure that purging works on a full cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor changes requested. I'm not super familiar with client side logic so @joereuss12 might also want to take a look, just to ensure.
The client logic LGTM. I mentioned this in the created issue but the duplicate test is deleted in one of my PR's #878 but I'll probably have to redo some cleanup depending on if this PR gets merged before mine or not. Also with #890 tested off of this branch, it seems statHttp seems to be working how it should as well, but that will probably need a little sprucing up once this gets merged. |
Most significant is ensuring the `purge` method returns an error object so the corresponding REST API can have an appropriate status code.
Adds type-safety to various fields of `TokenConfig`: - Version can no longer be set to something invalid without generating an error - Audience "Any" is now a helper function with the correct behavior based on the profile - There are `New` methods that force setting a valid profile.
Removes some nearly-duplicate code between the two modules' tests.
If we cannot get a HEAD request through a cache within a second, it's a strong signal that it won't be responsive later on when we try to download. Do simultaneous HEAD queries and sort those without a response after 1 second to the end. The intent is to reduce the time cost of a totally unresponsive cache.
When we need to do stat, if there are only caches provided, then perform HEAD against multiple simultaneously. Meant to reduce the impact of one bad cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of tiny changes requested and some questions. I added the comments in the individual commits and didn't expect they are not showing up in the conversation tab. You should be able to find them in the commits history.
@bbockelm I figured that you'll need to hit the comment icon in the commit history above ⬆️ instead of in the commit tab. Sorry for the hassle! |
Subject: user, | ||
} | ||
loginCookieTokenCfg.AddScopes(scopes) | ||
loginCookieTokenCfg := token.NewWLCGToken() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copy the comment from nested commit history:
Do we want to do it this way, or having NewWLCGToken to receive a TokenConfig param and write/overwrite the token profile and version then return the updated config? I feel like it would be less error prone if we ask user (ourselves) for a struct than having them assign the values one by one (and they are likely not aware of what are required to pass)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed with Brian, this should be a good follow-up ticket
client/handle_http.go
Outdated
idx int | ||
size uint64 | ||
err error | ||
}{idx, uint64(size), nil} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copy comment from hidden commit history:
Do we want to return error or nil here if there is an error converting Content-Length header?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. An error here will sort the cache toward the back (but not be fatal to the transfer).
} else { | ||
finished[result.idx] = true | ||
if result.idx == 0 { | ||
cancel() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copy comment from hidden commit history:
Trying to understand the logic here. Are we cancelling the reminding request if we get back from the first attempt? Wouldn't that cause the remaining attempts (no matter they succeeded or not) to have finished[idx] = false? Not sure if this will cause unwanted side effect
If the first cache returns back successfully, sort all the error'd out cases to the back of the array and the pending (canceled) ones to the middle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with the revision!
This adds a "local cache" module which provides simple caching functionality via a Unix domain socket.
Functionality:
PR is currently a draft; needs a good amount of testing and bugfixes.