Skip to content

Latest commit

 

History

History
25 lines (17 loc) · 1.18 KB

README.md

File metadata and controls

25 lines (17 loc) · 1.18 KB

quark

Quark hi-jacks SQL for spark and run the query using a different Query Engine. It has three modules,

LLQL (Low Level Query Language)

LLQL is a serialization of relational algebra using google protobuf. While targetted for Spark plan, it has no dependencies on Spark. In the future, we may use LLQL to serialize Hive, Pig, plans, or even query plans for a classical relational database like PostgreSQL.

Quark

Quark is some plumbing code that hi-jacks Spark plan and talks to Vitesse execution engine. It is an example of using LLQL.

Tpch

Implements TPCH benchmark on top of Spark. Many original TPCH queries use subqueries that is not supported by Spark so we rewrite those queries by pulling subquery out and converting to Join. There are several smaller issues, as commented in the source, but overall, it reflects the features that TPCH tests.

TPCH data can be generated by dbgen from the official site.

To load TPCH data (to Parquet format), scale 1, ./s-run.sh 1 sql TpchLoad

To run a query, (q1 to q22), scale 1, ./s-run.sh 1 sql Tpch q1 | grep QQQQ

TODO

  • Date related issue with Q6/Q7/Q9
  • Q22
  • Raw tpch data and loaded parquet, all have hard coded path.