- For Command Line Interface: please refer to Usage of Command Line Interface
- For Server mode and REST API: please refer to Usage of Server Mode
Please visit ${Kyligence-ZenML-Toolkit-{version}}/samples/sql/
to get the SSB dataset with data csv files and a sql file
# please replace ${Kyligence-ZenML-Toolkit-{version}} with absolute path
$ cd ${Kyligence-ZenML-Toolkit-{version}}
$ sh ./bin/zen.sh -i ./samples/sql/ssb.sql -o ./samples/sql
You will get a ssb.zen.yml
file in folder Kyligence-ZenML-Toolkit-{version}/samples/sql
- Login into Kyligence Zen
- Go to Data Tab, upload SSB csv files to create tables
- Create a view based on SSB tables
lineorder_l_dates_l_customer_l_supplier_l_part
as data model, the join key as list below - Go to Metrics, click Import, import
ssb.zen.yml
file - All metrics defined in sql file will be imported to Kyligence Zen
A SQL clause represents a query based on a data model, for example
select c_city, s_city, d_year, sum(lo_revenue) as lo_revenue
from lineorder
left join dates on lo_orderdate = d_datekey
left join customer on lo_custkey = c_custkey
left join supplier on lo_suppkey = s_suppkey
where c_nation = 'UNITED STATES' and s_nation = 'UNITED STATES'
and d_datekey >= '1992-01-01' and d_datekey <= '1997-01-01'
group by c_city, s_city, d_year
order by d_year asc, lo_revenue desc;
To extract the metrics from the SQL above, we need parse the SQL to SQLNode, and visit the SQL node to extract
- Measure as Metrics Expression
- The aggregation expression,
sum(lo_revenue)
- The aggregation expression,
- Measure Alias Name as Metrics Name
- The alias of aggregation expression,
lo_revenue
- If a aggregation expression has no alias, will use the expression as the alias name
- The alias of aggregation expression,
- Column as Dimensions
- Columns exists in select list and where clause (intersection set)
- Dimensions in select list:
c_city
,s_city
,d_year
- Dimensions in where clause:
c_nation
,d_datekey
- Dimensions in select list:
- Columns exists in select list and where clause (intersection set)
- Data Column as Time Dimensions
- Infer column data type in where clause to find
Date
columns as TimeDimension - The value of
d_datekey
is a date format1992-01-01
, so it can be set as a time dimension
- Infer column data type in where clause to find
- Join Relation as Data Model
- A join relation represents a data model
- If a SQL has no join relation, the
From Table
will be named as the data model name - If a SQL has join tables, will generate the data model name follows the pattern
{Fact_Table}_{JoinType}_{Dim_Table}...
lineorder left join dates left join customer left join supplier
->lineorder_l_dates_customer_supplier
If there are two SQLs
-- SQL 1. table a join table_b
select a, d, sum(b) from table_a
left join table_b on a.bid = b.id;
-- SQL 2. table_a join table_b join table_c
select a, e, sum(b), sum(c) from table_a
left join table_b on a.bid = b.id
left join table c on a.cid = c.id;
3 metrics will be extracted:
sum(b)
on data modeltable_a_l_table_b
with dimensiona, d
sum(b)
on data modeltable_a_l_table_b_l_table_c
with dimensiona, e
sum(c)
on data modeltable_a_l_table_b_l_table_c
with dimensiona, e
The data model of the second SQL table_a_l_table_b_l_table_c
"contains" the data model of first SQL table_a_l_table_b
.
So these 3 metrics can be merged on the same data model
sum(b)
on data modeltable_a_l_table_b_l_table_c
with dimensiona, d
sum(b)
on data modeltable_a_l_table_b_l_table_c
with dimensiona, e
sum(c)
on data modeltable_a_l_table_b_l_table_c
with dimensiona, e
Then loop the metrics, merge the metrics with same expression, union the dimension list, the final results will be
sum(b)
on data modeltable_a_l_table_b_l_table_c
, with dimensiona, d, e
sum(c)
on data modeltable_a_l_table_b_l_table_c
, with dimensiona, e
- Supports SparkSQL compatible expression
- Data model supports
left/inner
join type