Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEAT] Add basic list aggregations #2032

Merged
merged 11 commits into from
Apr 11, 2024
Merged

[FEAT] Add basic list aggregations #2032

merged 11 commits into from
Apr 11, 2024

Conversation

kevinzwang
Copy link
Member

As requested in #1977

List aggregations:

  • sum
  • count
  • mean
  • min
  • max

@github-actions github-actions bot added the enhancement New feature or request label Mar 22, 2024
Copy link

codecov bot commented Mar 22, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 85.26%. Comparing base (c2db062) to head (005e434).
Report is 52 commits behind head on main.

❗ Current head 005e434 differs from pull request most recent head ce971a8. Consider uploading reports for the commit ce971a8 to get more accurate results

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2032      +/-   ##
==========================================
+ Coverage   84.70%   85.26%   +0.55%     
==========================================
  Files          62       68       +6     
  Lines        6768     7268     +500     
==========================================
+ Hits         5733     6197     +464     
- Misses       1035     1071      +36     
Files Coverage Δ
daft/expressions/expressions.py 92.04% <100.00%> (+0.52%) ⬆️
daft/series.py 92.88% <ø> (-0.17%) ⬇️

... and 16 files with indirect coverage changes

@kevinzwang kevinzwang marked this pull request as ready for review March 22, 2024 20:22
@kevinzwang kevinzwang requested a review from samster25 March 22, 2024 20:22
src/daft-core/src/array/ops/list.rs Show resolved Hide resolved
src/daft-core/src/array/ops/list.rs Show resolved Hide resolved
src/daft-core/src/array/ops/list.rs Outdated Show resolved Hide resolved
src/daft-core/src/array/ops/list.rs Outdated Show resolved Hide resolved
src/daft-core/src/array/ops/list.rs Outdated Show resolved Hide resolved
src/daft-core/src/array/ops/list.rs Outdated Show resolved Hide resolved
src/daft-core/src/array/ops/list.rs Outdated Show resolved Hide resolved
src/daft-core/src/array/ops/list.rs Outdated Show resolved Hide resolved
src/daft-core/src/datatypes/field.rs Outdated Show resolved Hide resolved
src/daft-dsl/src/functions/list/sum.rs Show resolved Hide resolved
@kevinzwang
Copy link
Member Author

Updated the code to just slice the series and use the Series aggregations on each list. Much cleaner code and consolidates the logic to one point which I like, but unsure if there are performance issues with doing this. From what I can tell there is nothing that will require significant additional runtime or copies.

@kevinzwang kevinzwang requested a review from samster25 April 4, 2024 22:54
src/daft-core/src/array/fixed_size_list_array.rs Outdated Show resolved Hide resolved
src/daft-core/src/array/list_array.rs Outdated Show resolved Hide resolved
.iter()
.map(|s| s.unwrap_or(Series::empty("", self.child_data_type())))
.map(|s| op(&s))
.collect::<DaftResult<Vec<_>>>()?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we probably want to have a concat that takes in an iterator of Series for this. materializing all the Series then concating it is going to be slow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tabled until we create an ArrayBuilder so that we don't have to fully materialize the Series objects to concat

src/daft-core/src/datatypes/binary_ops.rs Outdated Show resolved Hide resolved
@kevinzwang kevinzwang requested a review from samster25 April 11, 2024 00:18
@kevinzwang kevinzwang merged commit 93fc6ca into main Apr 11, 2024
29 checks passed
@kevinzwang kevinzwang deleted the kevin/list-aggs branch April 11, 2024 00:35
kevinzwang pushed a commit that referenced this pull request Nov 27, 2024
Add
- list.count
- list.max
- list.mean
- list.min
- list.sum
expressions from #2032.

Additionally, sorted the expressions in this subsection.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants