(Unofficial) Python SDK for Athena Federation

This is an unofficial Python SDK for Athena Federation.

Overview

The Python SDK makes it easy to create new Amazon Athena Data Source Connectors using Python. It is under active development so the API may change from version to version.

You can see an example implementation that queries Google Sheets using Athena.

Current Limitations

Partitions are not supported, so Athena will not parallelize the query using partitions.

Example Implementations

Athena data source connector for Minio

Local Development

Ensure you've got the build module install and SDK dependencies.

pip install build
pip install -r requirements.txt

Now make a wheel.

python -m build

This will create a file in dist/: dist/unoffical_athena_federation_sdk-0.0.0-py3-none-any.whl

Copy that file to your example repo and you can include it in your requirements.txt like so:

unoffical-athena-federation-sdk @ file:///unoffical_athena_federation_sdk-0.0.0-py3-none-any.whl

Validating your connector

You can test your Lambda function locally using Lambda Docker images.

First, build our Docker image and run it.

docker build -t local/athena-python-example .
docker run --rm -p 9000:8080 local/athena-python-example

Then, we can execute a sample PingRequest.

curl -XPOST "http://localhost:9000/2015-03-31/functions/function/invocations" -d '{"@type": "PingRequest", "identity": {"id": "UNKNOWN", "principal": "UNKNOWN", "account": "123456789012", "arn": "arn:aws:iam::123456789012:root", "tags": {}, "groups": []}, "catalogName": "athena_python_sdk", "queryId": "1681559a-548b-4771-874c-2aa2ea7c39ab"}'

{"@type": "PingResponse", "catalogName": "athena_python_sdk", "queryId": "1681559a-548b-4771-874c-2aa2ea7c39ab", "sourceType": "athena_python_sdk", "capabilities": 23}

We can also list schemas.

curl -XPOST "http://localhost:9000/2015-03-31/functions/function/invocations" -d '{"@type": "ListSchemasRequest", "identity": {"id": "UNKNOWN", "principal": "UNKNOWN", "account": "123456789012", "arn": "arn:aws:iam::123456789012:root", "tags": {}, "groups": []}, "catalogName": "athena_python_sdk", "queryId": "1681559a-548b-4771-874c-2aa2ea7c39ab"}'

{"@type": "ListSchemasResponse", "catalogName": "athena_python_sdk", "schemas": ["sampledb"], "requestType": "LIST_SCHEMAS"}

Creating your Lambda function

💁 Please note these are manual instructions until a serverless application can be built.

First, let's define some variables we need throughout.

export SPILL_BUCKET=<BUCKET_NAME>
export AWS_ACCOUNT_ID=123456789012
export AWS_REGION=us-east-1
export IMAGE_TAG=v0.0.1

Create an S3 bucket that this Lambda function will use for Spill data

aws s3 mb ${SPILL_BUCKET}

Create an ECR repository for this image

aws ecr create-repository --repository-name athena_example --image-scanning-configuration scanOnPush=true

Push tag the image with the repo name and push it up

docker tag local/athena-python-example ${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com/athena_example:${IMAGE_TAG}
aws ecr get-login-password | docker login --username AWS --password-stdin ${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com
docker push ${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com/athena_example:${IMAGE_TAG}

Create an IAM role that will allow your Lambda function to execute

Note the Arn of the role that's returned

aws iam create-role \
    --role-name athena-example-execution-role \
    --assume-role-policy-document '{"Version": "2012-10-17","Statement": [{ "Effect": "Allow", "Principal": {"Service": "lambda.amazonaws.com"}, "Action": "sts:AssumeRole"}]}'
aws iam attach-role-policy \
    --role-name athena-example-execution-role \
    --policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole

Grant the IAM role access to your S3 bucket

aws iam create-policy --policy-name athena-example-s3-access --policy-document '{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["s3:ListBucket"],
      "Resource": ["arn:aws:s3:::'${SPILL_BUCKET}'"]
    },
    {
      "Effect": "Allow",
      "Action": [
        "s3:PutObject",
        "s3:GetObject",
        "s3:DeleteObject"
      ],
      "Resource": ["arn:aws:s3:::'${SPILL_BUCKET}'/*"]
    }
  ]
}'
aws iam attach-role-policy \
    --role-name athena-example-execution-role \
    --policy-arn arn:aws:iam::${AWS_ACCOUNT_ID}:policy/athena-example-s3-access

Now create your function pointing to the created repository image

aws lambda create-function \
    --function-name athena-python-example \
    --role arn:aws:iam::${AWS_ACCOUNT_ID}:role/athena-example-execution-role \
    --code ImageUri=${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com/athena_example:${IMAGE_TAG} \
    --environment 'Variables={TARGET_BUCKET=<BUCKET_NAME>}' \
    --description "Example Python implementation for Athena Federated Queries" \
    --timeout 60 \
    --package-type Image

Connect with Athena!

Choose "Data sources" on the top navigation bar in the Athena console and then click "Connect data source"

Choose the Lambda function you just created and click Connect!

Updating the Lambda function

If you update the Lambda function, re-run the build and push steps (updating the IMAGE_TAG variable) and then update the Lambda function:

aws lambda update-function-code \
    --function-name athena-python-example \
    --image-uri ${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com/athena_example:${IMAGE_TAG}

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.vscode		.vscode
example		example
src/athena		src/athena
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

(Unofficial) Python SDK for Athena Federation

Overview

Current Limitations

Example Implementations

Local Development

Validating your connector

Creating your Lambda function

Connect with Athena!

Updating the Lambda function

About

Releases

Packages

Contributors 2

Languages

License

dacort/athena-federation-python-sdk

Folders and files

Latest commit

History

Repository files navigation

(Unofficial) Python SDK for Athena Federation

Overview

Current Limitations

Example Implementations

Local Development

Validating your connector

Creating your Lambda function

Connect with Athena!

Updating the Lambda function

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages