-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEAT] [New Query Planner] Add support for fsspec filesystems to new query planner. #1357
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
clarkzinzow
force-pushed
the
clark/fsspec-support-new-query-planner
branch
2 times, most recently
from
September 8, 2023 20:14
573263a
to
8980f15
Compare
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #1357 +/- ##
==========================================
+ Coverage 87.17% 87.25% +0.08%
==========================================
Files 61 61
Lines 6010 6041 +31
==========================================
+ Hits 5239 5271 +32
+ Misses 771 770 -1
|
Merging this PR without a review in order to unblock downstream work, @xcharleslin @samster25 if y'all end up reviewing this and have feedback, leave a post-merge review and I'll add a follow-up PR implementing the feedback! |
clarkzinzow
force-pushed
the
clark/fsspec-support-new-query-planner
branch
from
September 12, 2023 16:13
72f820a
to
34fd0e3
Compare
clarkzinzow
force-pushed
the
clark/fsspec-support-new-query-planner
branch
from
September 12, 2023 16:30
34fd0e3
to
2307a07
Compare
clarkzinzow
added a commit
that referenced
this pull request
Sep 12, 2023
… UX. (#1358) This PR makes misc. tweaks to user-facing error messages in order to improve UX. E.g., adding column name context to dtype validation errors, in order to make the messages more actionable. This PR is stacked on top of #1357, see the final commit for the actual diff. The only non-trivial is moving `DataType::get_exploded_dtype()` to `Field::to_exploded_field()`, in order to have the column name context for failure cases (e.g. the dtype doesn't support exploding). As a driveby, this PR also fixes some suboptimal reprs for the new logical plan, particular around `Vec<Expr>`.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds support for
fsspec
filesystems to new query planner, removing all test skipping based upon theuse_new_query_planner
feature flag, thereby completing functional parity with the old query planner.Specifically, this PR lifts the
use_native_downloader
andIOConfig
configuration of theExternalSource::FileFormatConfig::ParquetSourceConfig
up to be a format-agnosticExternalSource::StorageConfig
, which has two variants:Native(IOConfig)
andPython(Py<fsspec.AbstractFileSystem>)
. Most of the changes are to the Python execution and I/O layer, which now orients around theStorageConfig
object instead of separatefs
,use_native_downloader
, andio_config
args.The one place where an
fs
arg continues to sneak through is for file globbing, since support for native globbing has not yet been merged. Once that's merged, both file reading and globbing will only use thestorage_config
, and Python-levelfsspec
filesystem args will go away.