Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Bump up max_header_size #3068

Merged
merged 6 commits into from
Oct 23, 2024
Merged

[BUG] Bump up max_header_size #3068

merged 6 commits into from
Oct 23, 2024

Conversation

raunakab
Copy link
Contributor

Overview

Bump up the MAX_HEADER_SIZE constant that's being used to compare against the page sizes.

(Naming seems weird. We're using the max "header" size to guard against "page" sizes.)

@raunakab raunakab requested a review from colin-ho October 17, 2024 22:13
@github-actions github-actions bot added the bug Something isn't working label Oct 17, 2024
@raunakab raunakab self-assigned this Oct 17, 2024
@raunakab raunakab marked this pull request as ready for review October 17, 2024 22:14
Copy link

codspeed-hq bot commented Oct 17, 2024

CodSpeed Performance Report

Merging #3068 will not alter performance

Comparing bug/parquet-large-files-reader (d39da8e) with main (d1b06fb)

Summary

✅ 17 untouched benchmarks

Copy link

codecov bot commented Oct 17, 2024

Codecov Report

Attention: Patch coverage is 66.66667% with 1 line in your changes missing coverage. Please review.

Project coverage is 77.20%. Comparing base (d1b06fb) to head (d39da8e).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/daft-parquet/src/file.rs 66.66% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3068      +/-   ##
==========================================
- Coverage   78.65%   77.20%   -1.45%     
==========================================
  Files         618      618              
  Lines       73192    75100    +1908     
==========================================
+ Hits        57569    57983     +414     
- Misses      15623    17117    +1494     
Files with missing lines Coverage Δ
src/daft-parquet/src/file.rs 74.01% <66.66%> (ø)

... and 11 files with indirect coverage changes

const MAX_HEADER_SIZE: usize = 256 * 1024 * 1024;
const MAX_HEADER_SIZE: usize = 2 * 1024 * 1024 * 1024;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we update the comment above this to explain our rationale for using 2GiB?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yessir! 🚀🚀🚀

Copy link
Contributor

@desmondcheongzx desmondcheongzx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Send it

@jaychia
Copy link
Contributor

jaychia commented Oct 23, 2024

Hey what's happening with this PR? Is this good to merge?

@raunakab
Copy link
Contributor Author

raunakab commented Oct 23, 2024

Hey what's happening with this PR? Is this good to merge?

@jaychia Ingress works, but currently debugging egress with PyArrow.

@raunakab raunakab enabled auto-merge (squash) October 23, 2024 07:38
@raunakab raunakab merged commit c69ee3f into main Oct 23, 2024
40 checks passed
@raunakab raunakab deleted the bug/parquet-large-files-reader branch October 23, 2024 07:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reads of extremely large pages from Parquet fail with "Operation would exceed memory use threshold"
3 participants