Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEAT] [New Query Planner] Add support for fsspec filesystems to new query planner. #1357

Merged
merged 4 commits into from
Sep 12, 2023

Conversation

clarkzinzow
Copy link
Contributor

@clarkzinzow clarkzinzow commented Sep 8, 2023

This PR adds support for fsspec filesystems to new query planner, removing all test skipping based upon the use_new_query_planner feature flag, thereby completing functional parity with the old query planner.

Specifically, this PR lifts the use_native_downloader and IOConfig configuration of the ExternalSource::FileFormatConfig::ParquetSourceConfig up to be a format-agnostic ExternalSource::StorageConfig, which has two variants: Native(IOConfig) and Python(Py<fsspec.AbstractFileSystem>). Most of the changes are to the Python execution and I/O layer, which now orients around the StorageConfig object instead of separate fs, use_native_downloader, and io_config args.

The one place where an fs arg continues to sneak through is for file globbing, since support for native globbing has not yet been merged. Once that's merged, both file reading and globbing will only use the storage_config, and Python-level fsspec filesystem args will go away.

@github-actions github-actions bot added the enhancement New feature or request label Sep 8, 2023
@clarkzinzow clarkzinzow force-pushed the clark/fsspec-support-new-query-planner branch 2 times, most recently from 573263a to 8980f15 Compare September 8, 2023 20:14
@codecov
Copy link

codecov bot commented Sep 8, 2023

Codecov Report

Merging #1357 (2307a07) into main (6c4d64e) will increase coverage by 0.08%.
The diff coverage is 94.79%.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1357      +/-   ##
==========================================
+ Coverage   87.17%   87.25%   +0.08%     
==========================================
  Files          61       61              
  Lines        6010     6041      +31     
==========================================
+ Hits         5239     5271      +32     
+ Misses        771      770       -1     
Files Changed Coverage Δ
daft/execution/physical_plan_factory.py 91.52% <ø> (ø)
daft/execution/rust_physical_plan_shim.py 98.21% <ø> (ø)
daft/logical/builder.py 78.26% <ø> (-0.32%) ⬇️
daft/logical/optimizer.py 97.92% <ø> (ø)
daft/runners/runner_io.py 86.66% <66.66%> (-3.96%) ⬇️
daft/io/common.py 90.90% <80.00%> (-4.10%) ⬇️
daft/runners/ray_runner.py 91.18% <88.88%> (-0.19%) ⬇️
daft/runners/pyrunner.py 95.80% <90.90%> (-0.47%) ⬇️
daft/table/schema_inference.py 96.00% <95.00%> (+4.10%) ⬆️
daft/execution/execution_step.py 93.84% <100.00%> (+0.22%) ⬆️
... and 8 more

@clarkzinzow
Copy link
Contributor Author

Merging this PR without a review in order to unblock downstream work, @xcharleslin @samster25 if y'all end up reviewing this and have feedback, leave a post-merge review and I'll add a follow-up PR implementing the feedback!

@clarkzinzow clarkzinzow force-pushed the clark/fsspec-support-new-query-planner branch from 72f820a to 34fd0e3 Compare September 12, 2023 16:13
@clarkzinzow clarkzinzow force-pushed the clark/fsspec-support-new-query-planner branch from 34fd0e3 to 2307a07 Compare September 12, 2023 16:30
@clarkzinzow clarkzinzow merged commit db53efb into main Sep 12, 2023
27 checks passed
@clarkzinzow clarkzinzow deleted the clark/fsspec-support-new-query-planner branch September 12, 2023 16:45
clarkzinzow added a commit that referenced this pull request Sep 12, 2023
… UX. (#1358)

This PR makes misc. tweaks to user-facing error messages in order to
improve UX. E.g., adding column name context to dtype validation errors,
in order to make the messages more actionable.

This PR is stacked on top of
#1357, see the final commit for
the actual diff.

The only non-trivial is moving `DataType::get_exploded_dtype()` to
`Field::to_exploded_field()`, in order to have the column name context
for failure cases (e.g. the dtype doesn't support exploding).

As a driveby, this PR also fixes some suboptimal reprs for the new
logical plan, particular around `Vec<Expr>`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant