Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a local cache module #884

Merged
merged 48 commits into from
Mar 11, 2024
Merged

Conversation

bbockelm
Copy link
Collaborator

@bbockelm bbockelm commented Mar 2, 2024

This adds a "local cache" module which provides simple caching functionality via a Unix domain socket.

Functionality:

  • The local cache invokes the built-in client and downloads to a dedicated directory.
  • Subsequent requests for the same object should come from the local directory.
  • Only full-file requests are supported.
  • Only supports token-based authorization; authorization policies are synchronized from the director.

PR is currently a draft; needs a good amount of testing and bugfixes.

bbockelm added 7 commits March 2, 2024 08:51
When the `file_cache` module will be added, the `launcher` module will
depend on the `file_cache` module which depends on the client.  That
introduces a circular dependency since the tests are in the client module
itself.

This splits the pieces that need the launcher into a separate `test` module
that breaks the dependency; it works as the launcher-integrated pieces only
depend on the public interfaces of the client.
The file cache provides a simple caching interface.  Only supporting
full files - and minimal functionality - it's meant to support caching
at worker ndoes.

This commit is a work-in-progress, lacking tests and still needing
integration with token-based authorization.
This adds the `file_cache` as a top-level module in Pelican that can
be invoked as part of launchers.

It also adds support for the client to accept caches of the form `unix://`.
If we want to keep most of Pelican cancellable, then any network
calls need to take a context.
A necessary prereq to validating tokens given an issuer.
A resource-enabled scope is of the form:

```
authz:/resource
```

(example: `storage.read:/foo`).  These resources are partially ordered
and hierarchical, meaning some scopes "contain" the authorization for
others.

For example, `storage.read:/foo` implies access to `storage.read:/foo/bar`

If a resource is not given, it is assumed to be `/`.
This adds support for authorization to the local cache and periodically
synchronizes the local cache's authorization policy with the director.
file_cache/simple_cache.go Fixed Show fixed Hide fixed
file_cache/simple_cache.go Fixed Show fixed Hide fixed
bbockelm added 11 commits March 2, 2024 22:05
Instead, provide a ID() method which returns a string.  Mirrors what
is done for the TransferJob object.
Includes fixes required to make the code work with the tests.
AddScopes now takes a variadic argument instead of a slice; further,
added some generics so the same functions can be used for normal or
resource-enabled scopes.
Includes corresponding bugfixes.  The client can now work with the cache!
Also fixes the client's `DoStat` function to work without topology.
This implements an upper limit on the transfer rates; meant mainly
to support unit tests.
The large file unit test ensures that partial progress is made; that is,
data flows to the client while it is still being downloaded.
@bbockelm bbockelm marked this pull request as ready for review March 4, 2024 04:15
@bbockelm
Copy link
Collaborator Author

bbockelm commented Mar 4, 2024

Marking as ready for review. Since original draft was posted, this has gained a reasonably test suite and the first round of bugfixes.

(Due to OSG Hub outages, it's not possible for unit tests to pass currently -- will continue to clean those up but it should be ready for a first-pass of a review.)

local_cache/local_cache.go Dismissed Show dismissed Hide dismissed
local_cache/local_cache.go Dismissed Show dismissed Hide dismissed
@joereuss12
Copy link
Contributor

One thing I noticed when writing a unit test for StatHttp():

pelican/client/main.go

Lines 198 to 222 in 459b7a6

understoodSchemes := []string{"osdf", "pelican", ""}
_, foundSource := Find(understoodSchemes, destUri.Scheme)
if !foundSource {
log.Errorln("Unknown schema provided:", destUri.Scheme)
return 0, errors.New("Unsupported scheme requested")
}
origScheme := destUri.Scheme
if config.GetPreferredPrefix() != "PELICAN" && origScheme == "" {
destUri.Scheme = "osdf"
}
if (destUri.Scheme == "osdf" || destUri.Scheme == "stash") && destUri.Host != "" {
destUri.Path = path.Clean("/" + destUri.Host + "/" + destUri.Path)
destUri.Host = ""
} else if destUri.Scheme == "pelican" {
federationUrl, _ := url.Parse(destUri.String())
federationUrl.Scheme = "https"
federationUrl.Path = ""
viper.Set("Federation.DiscoveryUrl", federationUrl.String())
err = config.DiscoverFederation()
if err != nil {
return 0, err
}
}

In this code snippet, you don't list stash as a understood scheme yet check for the stash scheme later on in the code. Do we want to allow stash as a valid scheme here?

@tannenba
Copy link

tannenba commented Mar 5, 2024

Nice !!! Some quick questions: Looking at the PR, it appears one can also set the size of the cache and a high/low water mark for when to do garbage collection ... is this using LRU or? If this module is activated, how does it impact meta data reported back to the stash_plugin... e.g. are there new/different error codes? can the client report back if the file was pulled from the local cache or not? Also I am reading it correctly that all authorization rules are honored as before, i.e. user X cannot access files limited only to user Y? How is that enforced, i.e. what prevents user X from simply doing a chdir to the directory containing the data cache and reading files?

bbockelm added 2 commits March 5, 2024 14:34
Make sure that `DoGet` / `DoCopy` / `DoPut` goes through the transfer
results and promote any errors.

Also touch up exported and internal functions in the client to follow
go naming practices.
Ensure that purging works on a full cache.
@bbockelm bbockelm requested a review from haoming29 March 7, 2024 14:15
@turetske turetske linked an issue Mar 7, 2024 that may be closed by this pull request
Copy link
Contributor

@haoming29 haoming29 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor changes requested. I'm not super familiar with client side logic so @joereuss12 might also want to take a look, just to ensure.

client/fed_test.go Outdated Show resolved Hide resolved
client/fed_test.go Outdated Show resolved Hide resolved
client/fed_test.go Show resolved Hide resolved
client/fed_test.go Outdated Show resolved Hide resolved
client/fed_test.go Outdated Show resolved Hide resolved
local_cache/cache_authz.go Show resolved Hide resolved
local_cache/cache_authz.go Show resolved Hide resolved
token/token_create.go Outdated Show resolved Hide resolved
local_cache/cache_test.go Outdated Show resolved Hide resolved
local_cache/cache_test.go Outdated Show resolved Hide resolved
@joereuss12
Copy link
Contributor

joereuss12 commented Mar 7, 2024

The client logic LGTM. I mentioned this in the created issue but the duplicate test is deleted in one of my PR's #878 but I'll probably have to redo some cleanup depending on if this PR gets merged before mine or not.

Also with #890 tested off of this branch, it seems statHttp seems to be working how it should as well, but that will probably need a little sprucing up once this gets merged.

bbockelm added 12 commits March 9, 2024 07:35
Most significant is ensuring the `purge` method returns an error object
so the corresponding REST API can have an appropriate status code.
Adds type-safety to various fields of `TokenConfig`:
- Version can no longer be set to something invalid without generating
  an error
- Audience "Any" is now a helper function with the correct behavior
  based on the profile
- There are `New` methods that force setting a valid profile.
Removes some nearly-duplicate code between the two modules' tests.
If we cannot get a HEAD request through a cache within a second, it's
a strong signal that it won't be responsive later on when we try to
download.  Do simultaneous HEAD queries and sort those without a response
after 1 second to the end.

The intent is to reduce the time cost of a totally unresponsive cache.
@bbockelm bbockelm requested a review from haoming29 March 11, 2024 02:26
When we need to do stat, if there are only caches provided, then
perform HEAD against multiple simultaneously.  Meant to reduce the
impact of one bad cache.
Copy link
Contributor

@haoming29 haoming29 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of tiny changes requested and some questions. I added the comments in the individual commits and didn't expect they are not showing up in the conversation tab. You should be able to find them in the commits history.

@haoming29
Copy link
Contributor

@bbockelm I figured that you'll need to hit the comment icon in the commit history above ⬆️ instead of in the commit tab. Sorry for the hassle!

Subject: user,
}
loginCookieTokenCfg.AddScopes(scopes)
loginCookieTokenCfg := token.NewWLCGToken()
Copy link
Contributor

@haoming29 haoming29 Mar 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy the comment from nested commit history:

Do we want to do it this way, or having NewWLCGToken to receive a TokenConfig param and write/overwrite the token profile and version then return the updated config? I feel like it would be less error prone if we ask user (ourselves) for a struct than having them assign the values one by one (and they are likely not aware of what are required to pass)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed with Brian, this should be a good follow-up ticket

idx int
size uint64
err error
}{idx, uint64(size), nil}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy comment from hidden commit history:

Do we want to return error or nil here if there is an error converting Content-Length header?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. An error here will sort the cache toward the back (but not be fatal to the transfer).

} else {
finished[result.idx] = true
if result.idx == 0 {
cancel()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy comment from hidden commit history:

Trying to understand the logic here. Are we cancelling the reminding request if we get back from the first attempt? Wouldn't that cause the remaining attempts (no matter they succeeded or not) to have finished[idx] = false? Not sure if this will cause unwanted side effect

haoming29 and others added 2 commits March 11, 2024 17:14
If the first cache returns back successfully, sort all the error'd
out cases to the back of the array and the pending (canceled) ones
to the middle.
@turetske turetske requested a review from haoming29 March 11, 2024 20:32
Copy link
Contributor

@haoming29 haoming29 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with the revision!

@haoming29 haoming29 merged commit b9cc955 into PelicanPlatform:main Mar 11, 2024
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add "local cache" component to Pelican
4 participants