quark

Quark hi-jacks SQL for spark and run the query using a different Query Engine. It has three modules,

LLQL (Low Level Query Language)

LLQL is a serialization of relational algebra using google protobuf. While targetted for Spark plan, it has no dependencies on Spark. In the future, we may use LLQL to serialize Hive, Pig, plans, or even query plans for a classical relational database like PostgreSQL.

Quark

Quark is some plumbing code that hi-jacks Spark plan and talks to Vitesse execution engine. It is an example of using LLQL.

Tpch

Implements TPCH benchmark on top of Spark. Many original TPCH queries use subqueries that is not supported by Spark so we rewrite those queries by pulling subquery out and converting to Join. There are several smaller issues, as commented in the source, but overall, it reflects the features that TPCH tests.

TPCH data can be generated by dbgen from the official site.

To load TPCH data (to Parquet format), scale 1, ./s-run.sh 1 sql TpchLoad

To run a query, (q1 to q22), scale 1, ./s-run.sh 1 sql Tpch q1 | grep QQQQ

TODO

Date related issue with Q6/Q7/Q9
Q22
Raw tpch data and loaded parquet, all have hard coded path.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

quark

LLQL (Low Level Query Language)

Quark

Tpch

TODO

Files

README.md

Latest commit

History

README.md

File metadata and controls

quark

LLQL (Low Level Query Language)

Quark

Tpch

TODO