megalinter fixes

quiltdata · Aug 12, 2024 · 13507cd · 13507cd
1 parent 0575843
commit 13507cd
Show file tree

Hide file tree

Showing 8 changed files with 29 additions and 28 deletions.
diff --git a/.github/workflows/mega-linter.yml b/.github/workflows/mega-linter.yml
@@ -7,7 +7,7 @@ on:  # yamllint disable-line rule:truthy
 permissions: read-all
 env: # Comment env block if you do not want to apply fixes
   APPLY_FIXES: all # When active, APPLY_FIXES must also be defined as environment variable (in github/workflows/mega-linter.yml or other CI tool)
-  DISABLE_LINTERS: SPELL_CSPELL,COPYPASTE_JSCPD,PYTHON_BANDIT,PYTHON_PYRIGHT,PYTHON_PYLINT,REPOSITORY_GRYPE,REPOSITORY_SECRETLINT,REPOSITORY_TRIVY,REPOSITORY_TRUFFLEHOG
+  DISABLE_LINTERS: SPELL_CSPELL,COPYPASTE_JSCPD,PYTHON_BANDIT,PYTHON_PYRIGHT,PYTHON_PYLINT,REPOSITORY_GRYPE,REPOSITORY_SECRETLINT,REPOSITORY_TRIVY,REPOSITORY_TRUFFLEHOG,REPOSITORY_CHECKOV
   MARKDOWN_MARKDOWNLINT_FILTER_REGEX_EXCLUDE: "tests/example.*ME\\.md" # Exclude example markdown files from markdownlint
 concurrency:
   group: ${{ github.ref }}-${{ github.workflow }}

diff --git a/.vscode/settings.json b/.vscode/settings.json
@@ -3,5 +3,6 @@
     "python.pythonPath": ".venv/bin/python",
     "python.analysis.extraPaths": [
         "src.*"
-    ]
+    ],
+    "python.analysis.typeCheckingMode": "basic"
 }
diff --git a/Dockerfile b/Dockerfile
@@ -4,7 +4,7 @@ FROM python:3.8-slim AS build
 WORKDIR /app
 
 # Get ready to build
-RUN pip install poetry
+RUN pip install --no-cache-dir poetry==1.8.3
 
 # Now copy the app over and build a wheel
 COPY src /app/src/
@@ -16,9 +16,10 @@ FROM amazon/aws-lambda-python:3.12.0 AS lambda
 
 ENV TARGET_BUCKET=replace_me
 
-COPY --from=build /app/dist/unoffical_athena_federation_sdk-*-py3-none-any.whl /
-RUN pip install /unoffical_athena_federation_sdk-*-py3-none-any.whl
+COPY --from=build /app/dist/athena_federation-*-py3-none-any.whl /
+RUN pip install --no-cache-dir /athena_federation-*-py3-none-any.whl
 
+WORKDIR /app
 COPY example/ ./
 RUN ls ./
 

diff --git a/Makefile b/Makefile
diff --git a/README.md b/README.md
@@ -22,7 +22,7 @@ You can see an example implementation that [queries Google Sheets using Athena](
 
 - Ensure you've got the `build` module install and SDK dependencies.
 
-```
+```shell
 pip install build
 pip install -r requirements.txt
 ```
@@ -33,12 +33,12 @@ pip install -r requirements.txt
 python -m build
 ```
 
-This will create a file in `dist/`: `dist/unoffical_athena_federation_sdk-0.0.0-py3-none-any.whl`
+This will create a file in `dist/`: `dist/athena_federation-0.1.0-py3-none-any.whl`
 
 Copy that file to your example repo and you can include it in your `requirements.txt` like so:
 
-```
-unoffical-athena-federation-sdk @ file:///unoffical_athena_federation_sdk-0.0.0-py3-none-any.whl
+```shell
+unoffical-athena-federation-sdk @ file:///athena_federation-0.1.0-py3-none-any.whl
 ```
 
 ## Validating your connector
@@ -76,7 +76,7 @@ curl -XPOST "http://localhost:9000/2015-03-31/functions/function/invocations" -d
 
 💁 _Please note these are manual instructions until a [serverless application](https://aws.amazon.com/serverless/serverlessrepo/) can be built._
 
-0. First, let's define some variables we need throughout.
+1. First, let's define some variables we need throughout.
 
 ```shell
 export SPILL_BUCKET=<BUCKET_NAME>
@@ -91,21 +91,21 @@ export IMAGE_TAG=v0.0.1
 aws s3 mb ${SPILL_BUCKET}
 ```
 
-2. Create an ECR repository for this image
+1. Create an ECR repository for this image
 
 ```shell
 aws ecr create-repository --repository-name athena_example --image-scanning-configuration scanOnPush=true
 ```
 
-3. Push tag the image with the repo name and push it up
+1. Push tag the image with the repo name and push it up
 
 ```shell
 docker tag local/athena-python-example ${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com/athena_example:${IMAGE_TAG}
 aws ecr get-login-password | docker login --username AWS --password-stdin ${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com
 docker push ${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com/athena_example:${IMAGE_TAG}
 ```
 
-4. Create an IAM role that will allow your Lambda function to execute
+1. Create an IAM role that will allow your Lambda function to execute
 
 _Note the `Arn` of the role that's returned_
 
@@ -118,7 +118,7 @@ aws iam attach-role-policy \
     --policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
 ```
 
-5. Grant the IAM role access to your S3 bucket
+1. Grant the IAM role access to your S3 bucket
 
 ```shell
 aws iam create-policy --policy-name athena-example-s3-access --policy-document '{
@@ -145,7 +145,7 @@ aws iam attach-role-policy \
     --policy-arn arn:aws:iam::${AWS_ACCOUNT_ID}:policy/athena-example-s3-access
 ```
 
-6. Now create your function pointing to the created repository image
+1. Now create your function pointing to the created repository image
 
 ```shell
 aws lambda create-function \
@@ -162,11 +162,7 @@ aws lambda create-function \
 
 1. Choose "Data sources" on the top navigation bar in the Athena console and then click "Connect data source"
 
-![](docs/athena_connect.png)
-
-2. Choose the Lambda function you just created and click `Connect`!
-
-![](docs/athena_connect_lambda.png)
+1. Choose the Lambda function you just created and click `Connect`!
 
 ## Updating the Lambda function
 

diff --git a/athena_federation/lambda_handler.py b/athena_federation/lambda_handler.py
@@ -1,3 +1,4 @@
+from typing import Union
 import athena_federation.models as models
 from athena_federation.athena_data_source import AthenaDataSource
 from athena_federation.batch_writer import BatchWriter
@@ -91,7 +92,7 @@ def GetSplitsRequest(self) -> models.GetSplitsResponse:
 
     ## END: Unimplmented placehodlders
 
-    def ReadRecordsRequest(self) -> models.ReadRecordsResponse:
+    def ReadRecordsRequest(self) -> Union[models.ReadRecordsResponse, models.RemoteReadRecordsResponse]:
         schema = AthenaSDKUtils.parse_encoded_schema(self.event["schema"]["schema"])
         database_name = self.event.get("tableName").get("schemaName")
         table_name = self.event.get("tableName").get("tableName")

diff --git a/athena_federation/models.py b/athena_federation/models.py
@@ -104,6 +104,8 @@ def encoded_partition_config(self):
         """
         Encodes the schema and each record in the partition config.
         """
+        if not self.partitions:
+            return {}
         partition_keys = self.partitions.keys()
         data = [pa.array(self.partitions[key]) for key in partition_keys]
         batch = pa.RecordBatch.from_arrays(data, list(partition_keys))

diff --git a/athena_federation/utils.py b/athena_federation/utils.py
@@ -5,6 +5,8 @@
 
 
 class AthenaSDKUtils:
+
+    @staticmethod
     def encode_pyarrow_object(pya_obj):
         """
         Encodes either a PyArrow Schema or set of Records to Base64.
@@ -13,24 +15,28 @@ def encode_pyarrow_object(pya_obj):
         """
         return base64.b64encode(pya_obj.serialize().slice(4)).decode("utf-8")
 
+    @staticmethod
     def parse_encoded_schema(b64_schema):
         return pa.ipc.open_stream(pa.BufferReader(base64.b64decode(b64_schema))).schema
 
+    @staticmethod
     def encode_pyarrow_records(pya_schema, record_hash):
         # This is basically the same as pa.record_batch(data, names=['c0', 'c1', 'c2'])
         return pa.RecordBatch.from_arrays(
             [pa.array(record_hash[name]) for name in pya_schema.names],
             schema=pya_schema,
         )
 
+    @staticmethod
     def decode_pyarrow_records(b64_schema, b64_records):
         """
         Decodes an encoded record set provided a similarly encoded schema.
         Returns just the records as the schema will be included with that
         """
         pa_schema = AthenaSDKUtils.parse_encoded_schema(b64_schema)
-        return pa.read_record_batch(base64.b64decode(b64_records), pa_schema)
+        return pa.ipc.read_record_batch(base64.b64decode(b64_records), pa_schema)
 
+    @staticmethod
     def generate_spill_metadata(bucket_name: str, bucket_path: str) -> dict:
         """
         Returns a unique spill location on S3 for a given bucket and path.