Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] add min_hash alternate hashers #3052

Merged
merged 37 commits into from
Oct 30, 2024
Merged

[FEATURE] add min_hash alternate hashers #3052

merged 37 commits into from
Oct 30, 2024

Conversation

andrewgazelka
Copy link
Member

@andrewgazelka andrewgazelka commented Oct 15, 2024

Likely also increases performance due to removing heap alloc in some places.

Copy link

codspeed-hq bot commented Oct 15, 2024

CodSpeed Performance Report

Merging #3052 will not alter performance

Comparing andrew/hash (f2d9ad4) with main (975c09e)

Summary

✅ 17 untouched benchmarks

@andrewgazelka andrewgazelka changed the title [REFACTOR] refactor hashing code [CHORE] refactor hashing code Oct 15, 2024
@github-actions github-actions bot added the chore label Oct 15, 2024
@andrewgazelka andrewgazelka changed the title [CHORE] refactor hashing code [CHORE] refactor min_hash code Oct 15, 2024
@andrewgazelka andrewgazelka changed the title [CHORE] refactor min_hash code [CHORE] clean up min_hash code Oct 15, 2024
Copy link

codecov bot commented Oct 15, 2024

Codecov Report

Attention: Patch coverage is 97.38407% with 22 lines in your changes missing coverage. Please review.

Project coverage is 78.88%. Comparing base (975c09e) to head (f2d9ad4).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
src/daft-hash/src/lib.rs 75.00% 7 Missing ⚠️
src/daft-minhash/src/tests.rs 98.03% 7 Missing ⚠️
src/daft-minhash/src/lib.rs 97.67% 3 Missing ⚠️
src/daft-functions/src/minhash.rs 92.59% 2 Missing ⚠️
daft/series.py 75.00% 1 Missing ⚠️
src/daft-core/src/python/series.rs 95.00% 1 Missing ⚠️
src/daft-sql/src/modules/hashing.rs 96.00% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3052      +/-   ##
==========================================
+ Coverage   78.73%   78.88%   +0.15%     
==========================================
  Files         621      624       +3     
  Lines       75193    75847     +654     
==========================================
+ Hits        59205    59834     +629     
- Misses      15988    16013      +25     
Files with missing lines Coverage Δ
daft/expressions/expressions.py 93.60% <100.00%> (+0.01%) ⬆️
src/daft-core/src/array/ops/minhash.rs 94.93% <100.00%> (+3.44%) ⬆️
src/daft-core/src/series/ops/minhash.rs 76.47% <100.00%> (+12.83%) ⬆️
src/daft-minhash/src/windowed.rs 100.00% <100.00%> (ø)
daft/series.py 89.72% <75.00%> (-0.11%) ⬇️
src/daft-core/src/python/series.rs 94.67% <95.00%> (-0.05%) ⬇️
src/daft-sql/src/modules/hashing.rs 96.55% <96.00%> (-0.42%) ⬇️
src/daft-functions/src/minhash.rs 75.86% <92.59%> (+7.68%) ⬆️
src/daft-minhash/src/lib.rs 97.67% <97.67%> (ø)
src/daft-hash/src/lib.rs 75.00% <75.00%> (ø)
... and 1 more

... and 3 files with indirect coverage changes

@jaychia jaychia requested a review from Vince7778 October 16, 2024 01:34
Copy link
Contributor

@universalmind303 universalmind303 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to make sure the sql function is also updated

src/daft-minhash/Cargo.toml Outdated Show resolved Hide resolved
src/daft-hash/Cargo.toml Outdated Show resolved Hide resolved
@universalmind303 universalmind303 self-requested a review October 18, 2024 21:06
@andrewgazelka andrewgazelka changed the title [CHORE] clean up min_hash code [FEATURE] add min_hash alternate hashers Oct 23, 2024
@andrewgazelka andrewgazelka force-pushed the andrew/hash branch 2 times, most recently from 8999d1d to 96ca843 Compare October 23, 2024 19:36
daft/expressions/expressions.py Outdated Show resolved Hide resolved
src/daft-core/src/python/series.rs Show resolved Hide resolved
src/daft-functions/src/minhash.rs Show resolved Hide resolved
src/daft-hash/src/lib.rs Outdated Show resolved Hide resolved
src/daft-hash/src/lib.rs Outdated Show resolved Hide resolved
src/daft-minhash/src/windowed.rs Show resolved Hide resolved
src/daft-minhash/src/windowed.rs Outdated Show resolved Hide resolved
src/daft-minhash/src/windowed.rs Outdated Show resolved Hide resolved
src/daft-minhash/src/lib.rs Outdated Show resolved Hide resolved
src/daft-minhash/src/lib.rs Show resolved Hide resolved
Copy link
Contributor

@universalmind303 universalmind303 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@samster25 samster25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

@andrewgazelka andrewgazelka enabled auto-merge (squash) October 30, 2024 23:15
@andrewgazelka andrewgazelka merged commit c803bc9 into main Oct 30, 2024
44 checks passed
@andrewgazelka andrewgazelka deleted the andrew/hash branch October 30, 2024 23:15
sagiahrac pushed a commit to sagiahrac/Daft that referenced this pull request Nov 4, 2024
Likely also increases performance due to removing heap alloc in some
places.

---------

Co-authored-by: Colin Ho <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants