-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Plug S3 plugin changes into multi-export S3 origins #1045
Plug S3 plugin changes into multi-export S3 origins #1045
Conversation
@brianhlin This is the PR I indicated I'd like you to glance over with an eye toward "how do people actually configure S3 origins". The PR description should have the various setup bits, and I think you know how all of that gets converted to env vars. Let me know if you see any issues! |
Nothing jumps out at me immediately but I think I need a little bit to digest the configs.
Seems potentially dangerous. Maybe we should have folks opt into this with a special config option? |
What happens if this is done and a bucket at the endpoint isn't public? |
There are no associated S3 credentials, so everything is prevented by lack of authentication. |
This config makes it look like S3 and POSIX origins are going to be mutually exclusive. Is that intentional?
Even still, it feels like we're entering foot gun territory here |
ef262f4
to
b0cd921
Compare
Yep, that's always been the case. We can either operate in S3 mode or posix mode. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any way to get an end-to-end s3 test going with both a configuration file setup and a command line setup? Or is that really not available because we don't have a good way of setting up a test S3 origin?
I'm happy to write one, but I don't know the best way to set up an S3 endpoint in the process. I can take a peek at programmatically creating a bucket and S3 credentials in Minio, but wasn't able to get that working when I tried previously. One alternative is to spin up the origin and point it at an AWS open data bucket, but that sounds like a test that's asking for trouble. If you think it's still worth the risk, I'll set it up. |
That's fair. I guess I would like confirmation from you that you've tested the following locally:
|
5383a4b
to
484759c
Compare
Okay, I got rid of the Minio dependency and pointed the origin tests at an AWS endpoint serving historical data (which, being historical, should never change). That allowed me to test various configuration setups and make sure the file pulled had the correct contents. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly questions I want clarified.
0262eaa
to
312dc9f
Compare
312dc9f
to
1030863
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very tentatively approving assuming all the local tests path.
If everything fails after merging even with the dev-container changes, revert the PR.
Just adding breadcrumbs here in case we end up needing them -- we're merging even though tests aren't passing because The problem is that the upstream changes are breaking (even though we keep the configuration in Pelican the same), so merging the updated container will break lots of other stuff. All the tests for this PR are currently passing when I run locally. Famous last words! |
This PR plugs upstream changes in the S3 plugin for XRootD into Pelican. Under the new setup, here are a few example ways to configure an S3 origin:
And finally, from the command line:
Here's something funky I did (open to negative feedback on it):
Right now we have an origin that exports all of AWS public data, and to make something similar possible under this new setup, we need a way to tell Pelican how to export an entire S3 endpoint, potentially without knowing all the buckets (many thousands in the case of AWS open data). I achieved this by deciding that to export an entire S3 endpoint, no bucket should be provided, and we assume ALL buckets at the endpoint are public. For example, this config will export all of AWS open data:
The peculiarity in this setup is that unlike other S3 exports where the bucket name is abstracted away from the user by
FederationPrefix
, in this setup objects are accessed by/aws-open-data/<bucket>/<object>
For example, my usual test file is to get the file
MD5SUMS
from thenoaa-wod-pds
bucket. In this setup, it comes from/aws-open-data/noaa-wod-pds/MD5SUMS