Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds S3 GUCs #81

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft

Adds S3 GUCs #81

wants to merge 2 commits into from

Conversation

aykut-bozkurt
Copy link
Collaborator

To make it more Postgres way, we define 3 s3 related GUCs.

  • pg_parquet.aws_config_file: an absolute path to the configuration file used by s3 client. Note that when set, the GUC overrides AWS_CONFIG_FILE environment variable. By default the GUC is unset,
  • pg_parquet.aws_shared_credentials_file: an absolute path to the shared credentials file used by s3 client. Note that when set, the GUC overrides AWS_SHARED_CREDENTIALS_FILE environment variable. By default the GUC is unset,
  • pg_parquet.aws_profile: the profile name used by s3 client. Note that when set, the GUC overrides AWS_PROFILE environment variable. By default the GUC is unset.

These GUCs can only be set by a superuser. You can easily set these GUCs to use different configurations per session. You do not need to restart your session to change the config file, shared credentials file or profile name.

Closes #70.

We add an option for `COPY FROM` called `match_by_name` which matches Parquet file fields to PostgreSQL table columns
`by their names` rather than `by their order` in the schema. By default, the option is `false`. The option is useful
when field order differs between the Parquet file and the table, but their names match.

**!!IMPORTANT!!**: This is a breaking change. Before the PR, we match always by name. This is a bit strict and not common
way to match schemas. (e.g. COPY FROM csv at postgres or COPY FROM of duckdb match by field position by default)
This is why we match by position by default and have a COPY FROM option `match_by_name` that can be set to true
for the old behaviour.

Closes #39.
To make it more Postgres way, we define 3 s3 related GUCs.

- `pg_parquet.aws_config_file`: an absolute path to the configuration file used by s3 client. Note that when set,
   the GUC overrides `AWS_CONFIG_FILE` environment variable. By default the GUC is unset,
- `pg_parquet.aws_shared_credentials_file`: an absolute path to the shared credentials file used by s3 client.
   Note that when set, the GUC overrides `AWS_SHARED_CREDENTIALS_FILE` environment variable. By default the GUC is unset,
- `pg_parquet.aws_profile`: the profile name used by s3 client. Note that when set, the GUC overrides `AWS_PROFILE`
   environment variable. By default the GUC is unset.

These GUCs can only be set by a superuser. You can easily set these GUCs to use different configurations per session.
You do not need to restart your session to change the config file, shared credentials file or profile name.

Closes #70.
// when the environment variable is not set, the default value
// (~/.aws/credentials) is used by object_store.
if let Some(aws_shared_credentials_file) = AWS_SHARED_CREDENTIALS_FILE.get() {
std::env::set_var(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do not set environment variables as it could be unexpected for other extensions... there should be a way to configure object_store.

Base automatically changed from aykut/match_by_position to main November 28, 2024 14:37
@aykut-bozkurt aykut-bozkurt marked this pull request as draft November 28, 2024 14:37
@aykut-bozkurt
Copy link
Collaborator Author

Not sure if we want to support GUCs though. We need to sync GUCs with s3 env vars. This logic better be in a middleware.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add settings for shared credentials file and config
1 participant