Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade Daft to 0.1.17 for improved performance and resource usage #217

Merged
merged 20 commits into from
Sep 12, 2023

Conversation

jaychia
Copy link
Contributor

@jaychia jaychia commented Sep 12, 2023

  • Lowers peak memory utilization by 50%
  • Introduces eager memory cleanup in jemalloc memory allocator, reducing resting memory utilization
  • Speeds up "all-column" reads by 48%
  • Adds max connection limit of 8 per file
  • Produce an Arrow chunked table directly to work around any schema-related issues when interoperating with PyArrow

These changes come with an API change, where Daft now provides a read_parquet_into_pyarrow function which returns a chunked PyArrow table directly. This PR introduces changes to use this new API as well.

Daft PRs that added these improvements:

@jaychia jaychia changed the title Upgrade Daft to 0.1.17 Upgrade Daft to 0.1.17 for improved performance and resource usage Sep 12, 2023
Copy link
Collaborator

@raghumdani raghumdani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good mostly. Thanks for the PR.

deltacat/__init__.py Outdated Show resolved Hide resolved
deltacat/tests/utils/test_daft.py Outdated Show resolved Hide resolved
deltacat/utils/daft.py Outdated Show resolved Hide resolved
deltacat/tests/utils/test_daft.py Outdated Show resolved Hide resolved
@jaychia jaychia requested a review from raghumdani September 12, 2023 21:14
@jaychia
Copy link
Contributor Author

jaychia commented Sep 12, 2023

Thanks for the comments @raghumdani!

Please take another look, especially at the test cases now to see if this matches your expected behavior of the API

Copy link
Collaborator

@raghumdani raghumdani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

deltacat/utils/daft.py Show resolved Hide resolved
deltacat/utils/daft.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@raghumdani raghumdani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@raghumdani raghumdani merged commit 6949831 into ray-project:main Sep 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants