Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEAT] Better ScanTask sizing estimations #3257

Closed
wants to merge 2 commits into from

Conversation

jaychia
Copy link
Contributor

@jaychia jaychia commented Nov 11, 2024

Add capabilities for performing better sizing of our ScanTasks.

When reading data, we often just have a "size on disk" to try and understand how much data would actually be materialized in memory. This PR adds a new DataSizeEstimator trait that can be implemented to provide a mechanism for Daft to better estimate the size in memory of a ScanTask.

@github-actions github-actions bot added the enhancement New feature or request label Nov 11, 2024
Copy link

codspeed-hq bot commented Nov 11, 2024

CodSpeed Performance Report

Merging #3257 will degrade performances by 35.89%

Comparing jay/better-scan-task-estimations (00dadad) with main (f290f40)

Summary

❌ 1 regressions
✅ 16 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark main jay/better-scan-task-estimations Change
test_iter_rows_first_row[100 Small Files] 177.3 ms 276.5 ms -35.89%

@jaychia
Copy link
Contributor Author

jaychia commented Nov 15, 2024

Closing in favor of #3302

@jaychia jaychia closed this Nov 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant