Skip to content

Commit

Permalink
#316: Specify GCS credentials via CONNECTION (#317)
Browse files Browse the repository at this point in the history
Co-authored-by: Christoph Kuhnke <[email protected]>
  • Loading branch information
kaklakariada and ckunki authored May 17, 2024
1 parent 45c099a commit d178c90
Show file tree
Hide file tree
Showing 20 changed files with 1,222 additions and 867 deletions.
2 changes: 2 additions & 0 deletions .github/workflows/broken_links_checker.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

7 changes: 3 additions & 4 deletions .github/workflows/ci-build.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 2 additions & 2 deletions .project-keeper.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,13 +12,13 @@ version:
fromSource: pom.xml
linkReplacements:
excludes:
# Project is written in Scala, no need to test with next Java version
- "E-PK-CORE-18: Outdated content: '.github/workflows/ci-build-next-java.yml'"
- "E-PK-CORE-17: Missing required file: 'release_config.yml'"
build:
runnerOs: ubuntu-20.04
freeDiskSpace: true
exasolDbVersions:
- "8.24.0"
- "8.25.0"
- "7.1.26"
workflows:
- name: ci-build.yml
Expand Down
34 changes: 18 additions & 16 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -1,18 +1,20 @@
{
"editor.formatOnSave": true,
"editor.codeActionsOnSave": {
"source.organizeImports": "explicit",
"source.generate.finalModifiers": "explicit"
},
"java.saveActions.organizeImports": true,
"java.sources.organizeImports.starThreshold": 3,
"java.sources.organizeImports.staticStarThreshold": 3,
"java.test.config": {
"vmArgs": [
"-Djava.util.logging.config.file=src/test/resources/logging.properties"
]
},
"files.watcherExclude": {
"**/target": true
}
"editor.formatOnSave": true,
"editor.codeActionsOnSave": {
"source.organizeImports": "explicit",
"source.generate.finalModifiers": "explicit",
"source.fixAll": "explicit"
},
"java.saveActions.organizeImports": true,
"java.sources.organizeImports.starThreshold": 3,
"java.sources.organizeImports.staticStarThreshold": 3,
"java.test.config": {
"vmArgs": [
"-Djava.util.logging.config.file=src/test/resources/logging.properties",
"-Dcom.exasol.dockerdb.image=8.26.0"
]
},
"files.watcherExclude": {
"**/target": true
}
}
341 changes: 173 additions & 168 deletions dependencies.md

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions doc/changes/changelog.md

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

74 changes: 74 additions & 0 deletions doc/changes/changes_2.8.0.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
# Cloud Storage Extension 2.8.0, released 2024-05-17

Code name: Simplify GCS Configuration

## Summary

This release allows configuring Google Cloud Storage (GCS) via a `CONNECTION` instead of uploading the credentials JSON file to BucketFS. This avoids exposing GCP credentials as file in BucketFS and simplifies configuration. See the [user guide](../user_guide/user_guide.md#configure-gcp-credentials) for details.
Please note for backwards compatibility you can still provide the GCS credentials as a file although CSE recommends configuring Google Cloud Storage (GCS) via a `CONNECTION`.

## Features

* #316: Allowed specifying GCS credentials via `CONNECTION`

## Dependency Updates

### Cloud Storage Extension

#### Compile Dependency Updates

* Added `com.github.mwiede:jsch:0.2.17`
* Updated `com.google.guava:guava:32.1.3-jre` to `33.2.0-jre`
* Updated `com.google.oauth-client:google-oauth-client:1.34.1` to `1.36.0`
* Updated `com.nimbusds:nimbus-jose-jwt:9.37.3` to `9.39.1`
* Updated `io.dropwizard.metrics:metrics-core:4.2.23` to `4.2.25`
* Updated `io.grpc:grpc-netty:1.60.0` to `1.63.0`
* Updated `io.netty:netty-codec-http2:4.1.108.Final` to `4.1.109.Final`
* Updated `org.apache.commons:commons-compress:1.26.0` to `1.26.1`
* Updated `org.apache.logging.log4j:log4j-1.2-api:2.22.0` to `2.23.1`
* Updated `org.apache.logging.log4j:log4j-api:2.22.0` to `2.23.1`
* Updated `org.apache.logging.log4j:log4j-core:2.22.0` to `2.23.1`
* Updated `org.jetbrains.kotlin:kotlin-stdlib:1.9.21` to `1.9.24`
* Updated `org.slf4j:jul-to-slf4j:2.0.9` to `2.0.13`

#### Runtime Dependency Updates

* Updated `ch.qos.logback:logback-classic:1.2.13` to `1.5.6`
* Updated `ch.qos.logback:logback-core:1.2.13` to `1.5.6`

#### Test Dependency Updates

* Updated `com.dimafeng:testcontainers-scala-scalatest_2.13:0.41.0` to `0.41.3`
* Updated `com.exasol:exasol-testcontainers:7.0.1` to `7.1.0`
* Updated `com.exasol:extension-manager-integration-test-java:0.5.7` to `0.5.11`
* Updated `nl.jqno.equalsverifier:equalsverifier:3.15.4` to `3.16.1`
* Updated `org.junit.jupiter:junit-jupiter-engine:5.10.1` to `5.10.2`
* Updated `org.mockito:mockito-core:5.8.0` to `5.12.0`
* Updated `org.testcontainers:localstack:1.19.3` to `1.19.8`

#### Plugin Dependency Updates

* Updated `com.diffplug.spotless:spotless-maven-plugin:2.41.0` to `2.43.0`
* Updated `com.exasol:error-code-crawler-maven-plugin:2.0.2` to `2.0.3`
* Updated `com.exasol:project-keeper-maven-plugin:4.3.0` to `4.3.1`
* Updated `net.alchim31.maven:scala-maven-plugin:4.8.1` to `4.9.1`
* Updated `org.apache.maven.plugins:maven-jar-plugin:3.3.0` to `3.4.1`
* Updated `org.apache.maven.plugins:maven-toolchains-plugin:3.1.0` to `3.2.0`
* Updated `org.codehaus.mojo:exec-maven-plugin:3.1.1` to `3.2.0`

### Extension

#### Compile Dependency Updates

* Updated `@exasol/extension-manager-interface:0.4.1` to `0.4.2`

#### Development Dependency Updates

* Updated `eslint:^8.55.0` to `^8.56.0`
* Updated `@types/node:^20.10.4` to `^20.12.12`
* Updated `@typescript-eslint/parser:^6.13.2` to `^7.9.0`
* Updated `ts-jest:^29.1.1` to `^29.1.2`
* Updated `typescript:^5.3.3` to `^5.4.5`
* Updated `@typescript-eslint/eslint-plugin:^6.13.2` to `^7.9.0`
* Updated `ts-node:^10.9.1` to `^10.9.2`
* Updated `esbuild:^0.19.8` to `^0.21.2`
6 changes: 6 additions & 0 deletions doc/developers_guide/developers_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,12 @@

This guide contains information for developers.

## Running Scala Linter

```
mvn compile test-compile scalastyle:check scalafix:scalafix spotless:check
```

## Working With the Managed Extension

This describes how to develop the extension for the [Extension Manager](https://github.com/exasol/extension-manager/).
Expand Down
87 changes: 47 additions & 40 deletions doc/user_guide/user_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,7 @@ downloaded jar file is the same as the checksum provided in the releases.
To check the SHA256 result of the local jar, run the command:

```sh
sha256sum exasol-cloud-storage-extension-2.7.12.jar
sha256sum exasol-cloud-storage-extension-2.8.0.jar
```

### Building From Source
Expand Down Expand Up @@ -180,7 +180,7 @@ mvn clean package -DskipTests=true
```

The assembled jar file should be located at
`target/exasol-cloud-storage-extension-2.7.12.jar`.
`target/exasol-cloud-storage-extension-2.8.0.jar`.

### Create an Exasol Bucket

Expand All @@ -202,7 +202,7 @@ for the HTTP protocol.
Upload the jar file using curl command:

```sh
curl -X PUT -T exasol-cloud-storage-extension-2.7.12.jar \
curl -X PUT -T exasol-cloud-storage-extension-2.8.0.jar \
http://w:<WRITE_PASSWORD>@exasol.datanode.domain.com:2580/<BUCKET>/
```

Expand Down Expand Up @@ -234,7 +234,7 @@ OPEN SCHEMA CLOUD_STORAGE_EXTENSION;

CREATE OR REPLACE JAVA SET SCRIPT IMPORT_PATH(...) EMITS (...) AS
%scriptclass com.exasol.cloudetl.scriptclasses.FilesImportQueryGenerator;
%jar /buckets/bfsdefault/<BUCKET>/exasol-cloud-storage-extension-2.7.12.jar;
%jar /buckets/bfsdefault/<BUCKET>/exasol-cloud-storage-extension-2.8.0.jar;
/

CREATE OR REPLACE JAVA SCALAR SCRIPT IMPORT_METADATA(...) EMITS (
Expand All @@ -244,12 +244,12 @@ CREATE OR REPLACE JAVA SCALAR SCRIPT IMPORT_METADATA(...) EMITS (
end_index DECIMAL(36, 0)
) AS
%scriptclass com.exasol.cloudetl.scriptclasses.FilesMetadataReader;
%jar /buckets/bfsdefault/<BUCKET>/exasol-cloud-storage-extension-2.7.12.jar;
%jar /buckets/bfsdefault/<BUCKET>/exasol-cloud-storage-extension-2.8.0.jar;
/

CREATE OR REPLACE JAVA SET SCRIPT IMPORT_FILES(...) EMITS (...) AS
%scriptclass com.exasol.cloudetl.scriptclasses.FilesDataImporter;
%jar /buckets/bfsdefault/<BUCKET>/exasol-cloud-storage-extension-2.7.12.jar;
%jar /buckets/bfsdefault/<BUCKET>/exasol-cloud-storage-extension-2.8.0.jar;
/
```

Expand All @@ -268,12 +268,12 @@ OPEN SCHEMA CLOUD_STORAGE_EXTENSION;

CREATE OR REPLACE JAVA SET SCRIPT EXPORT_PATH(...) EMITS (...) AS
%scriptclass com.exasol.cloudetl.scriptclasses.TableExportQueryGenerator;
%jar /buckets/bfsdefault/<BUCKET>/exasol-cloud-storage-extension-2.7.12.jar;
%jar /buckets/bfsdefault/<BUCKET>/exasol-cloud-storage-extension-2.8.0.jar;
/

CREATE OR REPLACE JAVA SET SCRIPT EXPORT_TABLE(...) EMITS (ROWS_AFFECTED INT) AS
%scriptclass com.exasol.cloudetl.scriptclasses.TableDataExporter;
%jar /buckets/bfsdefault/<BUCKET>/exasol-cloud-storage-extension-2.7.12.jar;
%jar /buckets/bfsdefault/<BUCKET>/exasol-cloud-storage-extension-2.8.0.jar;
/
```

Expand Down Expand Up @@ -407,13 +407,13 @@ CREATE OR REPLACE JAVA SCALAR SCRIPT IMPORT_METADATA(...) EMITS (
) AS
%jvmoption -DHTTPS_PROXY=http://username:password@10.10.1.10:1180
%scriptclass com.exasol.cloudetl.scriptclasses.FilesMetadataReader;
%jar /buckets/bfsdefault/<BUCKET>/exasol-cloud-storage-extension-2.7.12.jar;
%jar /buckets/bfsdefault/<BUCKET>/exasol-cloud-storage-extension-2.8.0.jar;
/

CREATE OR REPLACE JAVA SET SCRIPT IMPORT_FILES(...) EMITS (...) AS
%jvmoption -DHTTPS_PROXY=http://username:password@10.10.1.10:1180
%scriptclass com.exasol.cloudetl.scriptclasses.FilesDataImporter;
%jar /buckets/bfsdefault/<BUCKET>/exasol-cloud-storage-extension-2.7.12.jar;
%jar /buckets/bfsdefault/<BUCKET>/exasol-cloud-storage-extension-2.8.0.jar;
/
```

Expand Down Expand Up @@ -846,24 +846,11 @@ FROM SCRIPT CLOUD_STORAGE_EXTENSION.IMPORT_PATH WITH
## Google Cloud Storage

Similar to Amazon S3, you need to have security credentials to access the Google
Cloud Storage (GCP).
Cloud Storage (GCS).

For this, you need to set two properties when running the UDF:
### Service Accounts

```
GCS_PROJECT_ID
GCS_KEYFILE_PATH
```

The **GCS_PROJECT_ID** is a Google Cloud Platform (GCP) project identifier. It
is a unique string for your project which is composed of the project name and a
randomly assigned number. Please check out the GCP [creating and managing
projects][gcp-projects] page for more information.

The **GCS_KEYFILE_PATH** is a BucketFS path to the GCP private key file
location. It is usually stored in the JSON format.

A Google Cloud Platform service account is an identity that an application can
A Google Cloud Platform (GCP) service account is an identity that an application can
use to authenticate and perform authorized tasks on Google cloud resources. It
is a special type of Google account intended to represent a non-human user that
needs to access Google APIs. Please check out the GCP [introduction to service
Expand All @@ -878,18 +865,38 @@ generating [service account private key][gcp-auth-keys] documentation pages.
Once the service account is generated, give enough permissions to it to access
the Google Cloud Storage objects and download its private key as a JSON file.

Upload a GCP service account key file to a BucketFS bucket:
### Configure GCP Credentials

```bash
curl -X PUT -T gcp-<PROJECT_ID>-service-keyfile.json \
http://w:<PASSWORD>@exasol.datanode.domain.com:2580/<BUCKET>/
```
**Note:** Starting with version 2.8.0, cloud-storage-extension allows configuring GCP credentials via
a `CONNECTION`. Previous versions expected the GCP service account private key as a file in BucketFS
and property `GCS_KEYFILE_PATH`. While this is still possible we recommend using a `CONNECTION` because
this does not expose GCP credentials in BucketFS and it is easier to configure.

Create a named connection object containing the GCP service account private key as JSON:

Make sure that the bucket is **secure** and only **readable by users** who run
the Exasol Cloud Storage Extension scripts. Please check out the [BucketFS
Access
Control](https://docs.exasol.com/database_concepts/bucketfs/access_control.htm)
documentation for more information.
```sql
CREATE OR REPLACE CONNECTION GCS_CONNECTION
TO ''
USER ''
IDENTIFIED BY 'GCS_KEYFILE_CONTENT={
"type": "service_account",
"project_id": "<PROJECT_ID>",
"private_key_id": "<PRIVATE_KEY_ID>",
"private_key": "-----BEGIN PRIVATE KEY-----\n<PRIVATE_KEY>\n-----END PRIVATE KEY-----\n",
"client_email": "<CLIENT_EMAIL>",
"client_id": "<CLIENT_ID>",
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
"token_uri": "https://oauth2.googleapis.com/token",
"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
"client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/<CERTIFICATE>",
"universe_domain": "googleapis.com"
}';
```

To run the UDF you also need the **GCS_PROJECT_ID**. This is a Google Cloud Platform (GCP) project identifier. It
is a unique string for your project which is composed of the project name and a
randomly assigned number. Please check out the GCP [creating and managing
projects][gcp-projects] page for more information.

### Run Import Statement

Expand All @@ -898,8 +905,8 @@ IMPORT INTO <schema>.<table>
FROM SCRIPT CLOUD_STORAGE_EXTENSION.IMPORT_PATH WITH
BUCKET_PATH = 'gs://<GCS_STORAGE_PATH>/import/avro/data/*'
DATA_FORMAT = 'AVRO'
GCS_PROJECT_ID = '<GCP_PORJECT_ID>'
GCS_KEYFILE_PATH = '/buckets/bfsdefault/<BUCKET>/gcp-<PROJECT_ID>-service-keyfile.json';
GCS_PROJECT_ID = '<GCS_PROJECT_ID>'
CONNECTION_NAME = 'GCS_CONNECTION';
```

### Run Export Statement
Expand All @@ -909,8 +916,8 @@ EXPORT <schema>.<table>
INTO SCRIPT CLOUD_STORAGE_EXTENSION.EXPORT_PATH WITH
BUCKET_PATH = 'gs://<GCS_STORAGE_PATH>/export/parquet/data/'
DATA_FORMAT = 'PARQUET'
GCS_PROJECT_ID = '<GCP_PORJECT_ID>'
GCS_KEYFILE_PATH = '/buckets/bfsdefault/<BUCKET>/gcp-<PROJECT_ID>-service-keyfile.json';
GCS_PROJECT_ID = '<GCS_PROJECT_ID>'
CONNECTION_NAME = 'GCS_CONNECTION';
```

## Azure Blob Storage
Expand Down
2 changes: 1 addition & 1 deletion error_code_config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@ error-tags:
CSE:
packages:
- com.exasol.cloudetl
highest-index: 27
highest-index: 33
Loading

0 comments on commit d178c90

Please sign in to comment.