push based Vayu Execution Engine with multiple pipelines support #4

yashkothari42 · 2024-03-26T13:38:11Z

CSV Scan
Filter
Projection
HashJoin
Mostly using Datafusion code

Support to store RecordBatch and HashMap at the end of pipelines
2 pipeline hash join example on SQL query

TODO:

reduce dependency of datafusion classes, can use datafusion utility functions to do operators.
have a round robin scheduler which can be polled - for now we can store pipelines in an array at scheduler
hash aggregates - SUM
run 2 independent pipelines parallelly on separate cores
basic tests and comparison with datafusion
change return types to sink operator - PrintResult, StoreMap, StoreRecordBatch
do we need to have a sink operator which returns the value to main? no because we can have a sink operator which sends the data directly to the client, scheduler doesn't need data

Note: right now probe phase may recompute lots of data which was already done in build phase, this will be removed when TODO#1 is done. We don't need what the final API would be so better to delay it.

I'm hoping to get done with TODO#2-5 till student update 2 and then we can work on performance and adding other operators for final presentation.

…fusion

…working

yashkothari42 added 28 commits February 23, 2024 00:28

main function for datafusion - run custom execution engine using data…

a331889

…fusion

scan and filter working

1b206f0

pipeline structs

31693cd

pipeline with scan and filter operators

5b79c8a

unused imports removed

533c7fb

added vayu crate

5aaaf75

partial projection

67af6b2

using dyn ExecutionPlan directly instead of converting to proto:Scan …

154505c

…working

trying to implement operators without modifiying datafusion code.

fe31165

scan and filter working

e4773c0

removed arrow/datafusion

09b27e5

updated with yashkothari42/arrow-datafusion

340a6e9

updated datafusion

f02defe

removed unused imports

af0d7f4

projection working

c0b2766

change operator batch size

39d85af

revert join query

0fb44d0

add lighting

0ec5347

add SchedulerPipeline

2443f33

update vayu readme

28c43c3

add SchedulerSink and SchedulerSinkType

098854e

added store and integrated with pipeline

e1d82d3

missed some files

eb1ce2b

join working

e12b194

update arrow datafusion

a049089

simplied logic, removed useless stuff

8c29e3a

cleaned and comments

d16fc04

executor sink

10d01f9

yashkothari42 merged commit 71eaf97 into main Mar 27, 2024
1 check failed

yashkothari42 deleted the vayu-duckdb branch March 27, 2024 18:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

push based Vayu Execution Engine with multiple pipelines support #4

push based Vayu Execution Engine with multiple pipelines support #4

yashkothari42 commented Mar 26, 2024 •

edited

Loading

push based Vayu Execution Engine with multiple pipelines support #4

push based Vayu Execution Engine with multiple pipelines support #4

Conversation

yashkothari42 commented Mar 26, 2024 • edited Loading

yashkothari42 commented Mar 26, 2024 •

edited

Loading