Skip to content
This repository has been archived by the owner on Jun 6, 2024. It is now read-only.

push based Vayu Execution Engine with multiple pipelines support #4

Merged
merged 28 commits into from
Mar 27, 2024

Conversation

yashkothari42
Copy link
Collaborator

@yashkothari42 yashkothari42 commented Mar 26, 2024

CSV Scan
Filter
Projection
HashJoin
Mostly using Datafusion code

Support to store RecordBatch and HashMap at the end of pipelines
2 pipeline hash join example on SQL query

TODO:

  1. reduce dependency of datafusion classes, can use datafusion utility functions to do operators.
  2. have a round robin scheduler which can be polled - for now we can store pipelines in an array at scheduler
  3. hash aggregates - SUM
  4. run 2 independent pipelines parallelly on separate cores
  5. basic tests and comparison with datafusion
  6. change return types to sink operator - PrintResult, StoreMap, StoreRecordBatch
    do we need to have a sink operator which returns the value to main? no because we can have a sink operator which sends the data directly to the client, scheduler doesn't need data

Note: right now probe phase may recompute lots of data which was already done in build phase, this will be removed when TODO#1 is done. We don't need what the final API would be so better to delay it.

I'm hoping to get done with TODO#2-5 till student update 2 and then we can work on performance and adding other operators for final presentation.

@yashkothari42 yashkothari42 merged commit 71eaf97 into main Mar 27, 2024
1 check failed
@yashkothari42 yashkothari42 deleted the vayu-duckdb branch March 27, 2024 18:08
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant