Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix S3 in CLI: Do not normalize options values #13576

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

blaginin
Copy link
Contributor

@blaginin blaginin commented Nov 26, 2024

Which issue does this PR close?

Related to #13456

Rationale for this change

Surprisingly, even the simplest S3 example doesn't work 😱

> CREATE EXTERNAL TABLE test
STORED AS CSV
OPTIONS(
    'aws.access_key_id' '************',
    'aws.secret_access_key' '**********',
    'aws.region' 'eu-west-1'
)
LOCATION 's3://blaginin/playground/titanic.csv';


Object Store error: The operation lacked the necessary privileges to complete for path playground/titanic.csv: Client error with status 403 Forbidden: No Body

That's because s3 keys are case-sensitive, but now we always lowercase them.

I see you had the same issue for HuggingFace so I really think the default behaviour should be NOT normalizing.

What changes are included in this PR?

Change the default value

Are these changes tested?

I updated the tests

Are there any user-facing changes?

Yes, changed the default value

@github-actions github-actions bot added sql SQL Planner common Related to common crate labels Nov 26, 2024
@blaginin blaginin changed the title Fix S3 in CLI: Do not normalize option values Fix S3 in CLI: Do not normalize options values Nov 26, 2024
@github-actions github-actions bot added documentation Improvements or additions to documentation sqllogictest SQL Logic Tests (.slt) labels Nov 26, 2024
@blaginin blaginin marked this pull request as ready for review November 28, 2024 12:53
@blaginin blaginin marked this pull request as draft November 28, 2024 12:57
@blaginin
Copy link
Contributor Author

blaginin commented Dec 1, 2024

Actually, I think it should work differently - you should specify whether a value is case-sensitive per item. For example, an AWS password would be case-sensitive, but has_header would be case-insensitive. Otherwise this doesn't work:

CREATE EXTERNAL TABLE test
STORED AS CSV
LOCATION 'data.csv'
OPTIONS ('has_header' 'TRUE');

Error parsing TRUE as bool
caused by
External error: provided string was not `true` or `false`

Will change

Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels the right direction.
Can you add tests showing why this matters?
Perhaps some tests with parsing non-lowercase s3 paths?

@findepi
Copy link
Member

findepi commented Dec 2, 2024

you should specify whether a value is case-sensitive per item.

i agree

but it doesn't have to be part of the SQL parser at all.
It can simply be applied when pulling values from the option map (the moment when strings are interpreted into something meaningful)

alternatively, we could have some sort of option registry where options could have a type declared, so that parsing is consistent, but that would be a much bigger change

@github-actions github-actions bot added the core Core DataFusion crate label Dec 3, 2024
@@ -362,3 +362,15 @@ OPTIONS ('format.terminator' E'\r', 'format.has_header' 'true');

statement ok
drop table stored_table_with_cr_terminator;


# check that datetime_format is case sensitive
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that doesn't work, reported a separate issue #13633

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
common Related to common crate core Core DataFusion crate documentation Improvements or additions to documentation sql SQL Planner sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants