Skip to content
Paul Rogers edited this page Feb 25, 2017 · 3 revisions

Test Data

When developing Drill it is handy to have a variety of test data available. Below is a partial list of such resources.

TPC-H

Drill includes the TPC-H data and queries. Scan the specification for details, especially the ER diagram on page 11 (reproduced below.)

(insert image)

Compared to the TPC-H schema, Drill adds a column prefix of the from "x_":

  • customer: c_
  • orders: o_
  • lineitems: l_

Other notes:

  • TestTpchDistributedConcurrent tests a variety of TPC-H queries. Look at it for links to the queries and data.
  • Queries are in drill-java-exec/src/test/resources/queries/tpch.
  • Data is available to Drill in cp.`tpch/something.parquet`
  • Data is packaged in tpch-sample-data-x.y.z.jar
  • Data is also available on the class path in the folder: contrib/data/tpch-sample-data/target/classes/tpch

Schema

As reported from parquet-tools schema:

customer.parquet

message root {
  required int32 c_custkey;
  required binary c_name (UTF8);
  required binary c_address (UTF8);
  required int32 c_nationkey;
  required binary c_phone (UTF8);
  required double c_acctbal;
  required binary c_mktsegment (UTF8);
  required binary c_comment (UTF8);
}

lineitem.parquet

message root {
  required int32 l_orderkey;
  required int32 l_partkey;
  required int32 l_suppkey;
  required int32 l_linenumber;
  required double l_quantity;
  required double l_extendedprice;
  required double l_discount;
  required double l_tax;
  required binary l_returnflag (UTF8);
  required binary l_linestatus (UTF8);
  required int32 l_shipdate (DATE);
  required int32 l_commitdate (DATE);
  required int32 l_receiptdate (DATE);
  required binary l_shipinstruct (UTF8);
  required binary l_shipmode (UTF8);
  required binary l_comment (UTF8);
}

nation.parquet

message root {
  required int32 n_nationkey;
  required binary n_name (UTF8);
  required int32 n_regionkey;
  required binary n_comment (UTF8);
}

orders.parquet

message root {
  required int32 o_orderkey;
  required int32 o_custkey;
  required binary o_orderstatus (UTF8);
  required double o_totalprice;
  required int32 o_orderdate (DATE);
  required binary o_orderpriority (UTF8);
  required binary o_clerk (UTF8);
  required int32 o_shippriority;
  required binary o_comment (UTF8);
}

part.parquet

message root {
  required int32 p_partkey;
  required binary p_name (UTF8);
  required binary p_mfgr (UTF8);
  required binary p_brand (UTF8);
  required binary p_type (UTF8);
  required int32 p_size;
  required binary p_container (UTF8);
  required double p_retailprice;
  required binary p_comment (UTF8);
}

partsupp.parquet

message root {
  required int32 ps_partkey;
  required int32 ps_suppkey;
  required int32 ps_availqty;
  required double ps_supplycost;
  required binary ps_comment (UTF8);
}

region.parquet

message root {
  required int32 r_regionkey;
  required binary r_name (UTF8);
  required binary r_comment (UTF8);
}

supplier.parquet

message root {
  required int32 s_suppkey;
  required binary s_name (UTF8);
  required binary s_address (UTF8);
  required int32 s_nationkey;
  required binary s_phone (UTF8);
  required double s_acctbal;
  required binary s_comment (UTF8);
}

FoodMart Analytic Data

Drill ships the FoodMart data set maintained by Julian Hyde, adapted from the original Microsoft version.

Clone this wiki locally