Skip to content

Commit

Permalink
megalinter fixes
Browse files Browse the repository at this point in the history
  • Loading branch information
drernie committed Aug 12, 2024
1 parent 0575843 commit 13507cd
Show file tree
Hide file tree
Showing 8 changed files with 29 additions and 28 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/mega-linter.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ on: # yamllint disable-line rule:truthy
permissions: read-all
env: # Comment env block if you do not want to apply fixes
APPLY_FIXES: all # When active, APPLY_FIXES must also be defined as environment variable (in github/workflows/mega-linter.yml or other CI tool)
DISABLE_LINTERS: SPELL_CSPELL,COPYPASTE_JSCPD,PYTHON_BANDIT,PYTHON_PYRIGHT,PYTHON_PYLINT,REPOSITORY_GRYPE,REPOSITORY_SECRETLINT,REPOSITORY_TRIVY,REPOSITORY_TRUFFLEHOG
DISABLE_LINTERS: SPELL_CSPELL,COPYPASTE_JSCPD,PYTHON_BANDIT,PYTHON_PYRIGHT,PYTHON_PYLINT,REPOSITORY_GRYPE,REPOSITORY_SECRETLINT,REPOSITORY_TRIVY,REPOSITORY_TRUFFLEHOG,REPOSITORY_CHECKOV
MARKDOWN_MARKDOWNLINT_FILTER_REGEX_EXCLUDE: "tests/example.*ME\\.md" # Exclude example markdown files from markdownlint
concurrency:
group: ${{ github.ref }}-${{ github.workflow }}
Expand Down
3 changes: 2 additions & 1 deletion .vscode/settings.json
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,6 @@
"python.pythonPath": ".venv/bin/python",
"python.analysis.extraPaths": [
"src.*"
]
],
"python.analysis.typeCheckingMode": "basic"
}
7 changes: 4 additions & 3 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ FROM python:3.8-slim AS build
WORKDIR /app

# Get ready to build
RUN pip install poetry
RUN pip install --no-cache-dir poetry==1.8.3

# Now copy the app over and build a wheel
COPY src /app/src/
Expand All @@ -16,9 +16,10 @@ FROM amazon/aws-lambda-python:3.12.0 AS lambda

ENV TARGET_BUCKET=replace_me

COPY --from=build /app/dist/unoffical_athena_federation_sdk-*-py3-none-any.whl /
RUN pip install /unoffical_athena_federation_sdk-*-py3-none-any.whl
COPY --from=build /app/dist/athena_federation-*-py3-none-any.whl /
RUN pip install --no-cache-dir /athena_federation-*-py3-none-any.whl

WORKDIR /app
COPY example/ ./
RUN ls ./

Expand Down
6 changes: 0 additions & 6 deletions Makefile

This file was deleted.

26 changes: 11 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ You can see an example implementation that [queries Google Sheets using Athena](

- Ensure you've got the `build` module install and SDK dependencies.

```
```shell
pip install build
pip install -r requirements.txt
```
Expand All @@ -33,12 +33,12 @@ pip install -r requirements.txt
python -m build
```

This will create a file in `dist/`: `dist/unoffical_athena_federation_sdk-0.0.0-py3-none-any.whl`
This will create a file in `dist/`: `dist/athena_federation-0.1.0-py3-none-any.whl`

Copy that file to your example repo and you can include it in your `requirements.txt` like so:

```
unoffical-athena-federation-sdk @ file:///unoffical_athena_federation_sdk-0.0.0-py3-none-any.whl
```shell
unoffical-athena-federation-sdk @ file:///athena_federation-0.1.0-py3-none-any.whl
```

## Validating your connector
Expand Down Expand Up @@ -76,7 +76,7 @@ curl -XPOST "http://localhost:9000/2015-03-31/functions/function/invocations" -d

💁 _Please note these are manual instructions until a [serverless application](https://aws.amazon.com/serverless/serverlessrepo/) can be built._

0. First, let's define some variables we need throughout.
1. First, let's define some variables we need throughout.

```shell
export SPILL_BUCKET=<BUCKET_NAME>
Expand All @@ -91,21 +91,21 @@ export IMAGE_TAG=v0.0.1
aws s3 mb ${SPILL_BUCKET}
```

2. Create an ECR repository for this image
1. Create an ECR repository for this image

```shell
aws ecr create-repository --repository-name athena_example --image-scanning-configuration scanOnPush=true
```

3. Push tag the image with the repo name and push it up
1. Push tag the image with the repo name and push it up

```shell
docker tag local/athena-python-example ${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com/athena_example:${IMAGE_TAG}
aws ecr get-login-password | docker login --username AWS --password-stdin ${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com
docker push ${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com/athena_example:${IMAGE_TAG}
```

4. Create an IAM role that will allow your Lambda function to execute
1. Create an IAM role that will allow your Lambda function to execute

_Note the `Arn` of the role that's returned_

Expand All @@ -118,7 +118,7 @@ aws iam attach-role-policy \
--policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
```

5. Grant the IAM role access to your S3 bucket
1. Grant the IAM role access to your S3 bucket

```shell
aws iam create-policy --policy-name athena-example-s3-access --policy-document '{
Expand All @@ -145,7 +145,7 @@ aws iam attach-role-policy \
--policy-arn arn:aws:iam::${AWS_ACCOUNT_ID}:policy/athena-example-s3-access
```

6. Now create your function pointing to the created repository image
1. Now create your function pointing to the created repository image

```shell
aws lambda create-function \
Expand All @@ -162,11 +162,7 @@ aws lambda create-function \

1. Choose "Data sources" on the top navigation bar in the Athena console and then click "Connect data source"

![](docs/athena_connect.png)

2. Choose the Lambda function you just created and click `Connect`!

![](docs/athena_connect_lambda.png)
1. Choose the Lambda function you just created and click `Connect`!

## Updating the Lambda function

Expand Down
3 changes: 2 additions & 1 deletion athena_federation/lambda_handler.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
from typing import Union
import athena_federation.models as models
from athena_federation.athena_data_source import AthenaDataSource
from athena_federation.batch_writer import BatchWriter
Expand Down Expand Up @@ -91,7 +92,7 @@ def GetSplitsRequest(self) -> models.GetSplitsResponse:

## END: Unimplmented placehodlders

def ReadRecordsRequest(self) -> models.ReadRecordsResponse:
def ReadRecordsRequest(self) -> Union[models.ReadRecordsResponse, models.RemoteReadRecordsResponse]:
schema = AthenaSDKUtils.parse_encoded_schema(self.event["schema"]["schema"])
database_name = self.event.get("tableName").get("schemaName")
table_name = self.event.get("tableName").get("tableName")
Expand Down
2 changes: 2 additions & 0 deletions athena_federation/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,8 @@ def encoded_partition_config(self):
"""
Encodes the schema and each record in the partition config.
"""
if not self.partitions:
return {}
partition_keys = self.partitions.keys()
data = [pa.array(self.partitions[key]) for key in partition_keys]
batch = pa.RecordBatch.from_arrays(data, list(partition_keys))
Expand Down
8 changes: 7 additions & 1 deletion athena_federation/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@


class AthenaSDKUtils:

@staticmethod
def encode_pyarrow_object(pya_obj):
"""
Encodes either a PyArrow Schema or set of Records to Base64.
Expand All @@ -13,24 +15,28 @@ def encode_pyarrow_object(pya_obj):
"""
return base64.b64encode(pya_obj.serialize().slice(4)).decode("utf-8")

@staticmethod
def parse_encoded_schema(b64_schema):
return pa.ipc.open_stream(pa.BufferReader(base64.b64decode(b64_schema))).schema

@staticmethod
def encode_pyarrow_records(pya_schema, record_hash):
# This is basically the same as pa.record_batch(data, names=['c0', 'c1', 'c2'])
return pa.RecordBatch.from_arrays(
[pa.array(record_hash[name]) for name in pya_schema.names],
schema=pya_schema,
)

@staticmethod
def decode_pyarrow_records(b64_schema, b64_records):
"""
Decodes an encoded record set provided a similarly encoded schema.
Returns just the records as the schema will be included with that
"""
pa_schema = AthenaSDKUtils.parse_encoded_schema(b64_schema)
return pa.read_record_batch(base64.b64decode(b64_records), pa_schema)
return pa.ipc.read_record_batch(base64.b64decode(b64_records), pa_schema)

@staticmethod
def generate_spill_metadata(bucket_name: str, bucket_path: str) -> dict:
"""
Returns a unique spill location on S3 for a given bucket and path.
Expand Down

0 comments on commit 13507cd

Please sign in to comment.