Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE REQUEST #32] On-Premises S3 / S3 Compatible... #389

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

lefebsy
Copy link

@lefebsy lefebsy commented Oct 21, 2024

Description (edited) :

This is a S3 proposition of Polaris core storage implementation, copy of the aws + new parameters : endpoint, path style...

It is tested OK with local MinIO and also with Backblaze B2 (Thanks to @metadaddy).
This should works with many S3 compatible solutions like Ceph.io, Dell ECS, NetApp StorageGRID, etc...

  • By default it is trying to respect the same behavior about credentials than AWS (IAM/STS). The same dynamic policy is applied, limiting the scope to the data queried.

  • Otherwise if STS is not available 'skipCredentialSubscopingIndirection' = true will disabling Polaris "SubScoping" of the credentials

Let me know your opinion about this design proposal.
Thank you

curl -X POST -H "Authorization: Bearer ${SPARK_BEARER_TOKEN}" \
     -H 'Accept: application/json' -H 'Content-Type: application/json' \
     http://${POLARIS_HOST}:8181/api/management/v1/catalogs -d \
      '{
          "name": "my-s3compatible-catalog",
          "id": 100,
          "type": "INTERNAL",
          "readOnly": false,
          "properties": {
            "default-base-location": "${S3_LOCATION}"
          },
          "storageConfigInfo": {
            "storageType": "S3_COMPATIBLE",
            "allowedLocations": ["${S3_LOCATION}/"],
            "s3.endpoint": "https://localhost:9000"
          }
        }'
            # optional:
            "s3.pathStyleAccess": true / false
            "s3.region": "rack-1 or us-east-1"
            "s3.roleArn": "arn:xxx:xxx:xxx:xxxx"
            "s3.credentials.catalog.accessKeyId": "CATALOG_1_ACCESS_KEY_ENV_VARIABLE_NAME"
            "s3.credentials.catalog.secretAccessKey": "CATALOG_1_SECRET_KEY_ENV_VARIABLE_NAME"
            # optional in case STS/IAM is not available :
            "skipCredentialSubscopingIndirection": true
                # optional for skipCredentialSubscopingIndirection - served by catalog to client like spark or trino
                "s3.credentials.client.accessKeyId": "CLIENT_OF_CATALOG_1_ACCESS_KEY_ENV_VARIABLE_NAME"
                "s3.credentials.client.secretAccessKey": "CLIENT_OF_CATALOG_1_SECRET_KEY_ENV_VARIABLE_NAME"

Included Changes:

  • New type of storage "S3_COMPATIBLE".
  • Tested against MinIO with self-signed certificate
  • regtests/run_spark_sql_s3compatible.sh

Type of change:

  • Bug fix (non-breaking change which fixes an issue)
  • Documentation update
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Checklist:

Please delete options that are not relevant.

  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • If adding new functionality, I have discussed my implementation with the community using the linked GitHub issue

@lefebsy lefebsy changed the title add s3 compatible storage - first commit [FEATURE REQUEST] On-Premise S3... #32 Oct 21, 2024
@lefebsy lefebsy changed the title [FEATURE REQUEST] On-Premise S3... #32 [FEATURE REQUEST #32] On-Premise S3... Oct 21, 2024
@lefebsy lefebsy changed the title [FEATURE REQUEST #32] On-Premise S3... [FEATURE REQUEST #32] On-Premises S3... Oct 21, 2024
@lefebsy

This comment was marked as outdated.

@jean-humann
Copy link

Hello everyone this PR seems to be blocked for a month now, is there anything we can do to make it to the end ? 🙏

@lefebsy
Copy link
Author

lefebsy commented Dec 11, 2024

Hello everyone this PR seems to be blocked for a month now, is there anything we can do to make it to the end ? 🙏

Sorry for the one-month break. I tried the approaches proposed in the comments.
Refactoring performed ;)

@lefebsy lefebsy force-pushed the s3compatible branch 2 times, most recently from eec522f to 342f911 Compare December 13, 2024 23:59
@lefebsy
Copy link
Author

lefebsy commented Dec 16, 2024

Refactored after many comments :

  • SKIP_CREDENTIAL_SUBSCOPING_INDIRECTION is replacing the initial 'Strategy options' as requested by @eric-maynard

  • Come back of RoleARN as optional parameter, because requested in the issue [FEATURE REQUEST] On-Premise S3 & Remote Signing #32

  • Added 'Region for client' to be aligned with last AWS implementation modifications

  • Removed the credentials parameters 'by value', only env var name are possible. If parameters are empty, it will fallback to default AWS credentials provider

  • @collado-mike comments not completly done

    • about 'at least a session token' :
      seems to be not available in the S3 compatible softwares I've tested, or I missed the information - only AssumeRole (with empty role or not) is often implemented
    • about the SKIP_CREDENTIAL_SUBSCOPING_INDIRECTION should be in the catalog properties and not in the storage porperties :
      Polaris is not forwarding the catalog properties where it is needed. Maybe it can be adressed later in other PR ?
    • STS client created inside the core/storage classes, not pass via constructor :
      Looks similar to the Azure implementation, not the AWS one. I have avoided to modify the 'Service' code in this PR, modifications are limited to REST specs, core Storage and regTest...

Thank you

@lefebsy
Copy link
Author

lefebsy commented Dec 19, 2024

Ready for review.

@flyrain flyrain added this to the 1.0.0 milestone Dec 19, 2024
@Gerrit-K
Copy link

Not sure if this is out of scope for this PoC(?), but this PR doesn't contain the corresponding changes for the CLI, yet. This could break some CLI commands. For example, if a catalog with type S3_COMPATIBLE is created, the command polaris catalogs list won't work anymore, even if there are other catalogs with regular storage types.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants