Quark hi-jacks SQL for spark and run the query using a different Query Engine. It has three modules,
LLQL is a serialization of relational algebra using google protobuf. While targetted for Spark plan, it has no dependencies on Spark. In the future, we may use LLQL to serialize Hive, Pig, plans, or even query plans for a classical relational database like PostgreSQL.
Quark is some plumbing code that hi-jacks Spark plan and talks to Vitesse execution engine. It is an example of using LLQL.
Implements TPCH benchmark on top of Spark. Many original TPCH queries use subqueries that is not supported by Spark so we rewrite those queries by pulling subquery out and converting to Join. There are several smaller issues, as commented in the source, but overall, it reflects the features that TPCH tests.
TPCH data can be generated by dbgen from the official site.
To load TPCH data (to Parquet format), scale 1, ./s-run.sh 1 sql TpchLoad
To run a query, (q1 to q22), scale 1, ./s-run.sh 1 sql Tpch q1 | grep QQQQ
- Date related issue with Q6/Q7/Q9
- Q22
- Raw tpch data and loaded parquet, all have hard coded path.