This project contains examples of user defined functions for Apache Drill.
These are packaged as a separate project so that you can clone this project as a head start in creating your own extensions for Drill.
Clone this package and compile it.
git clone https://github.com/mapr-demos/simple-drill-functions.git
cd simple-drill-functions
mvn package
cd ..
Now download and unpack Apache Drill.
wget http://getdrill.org/drill/download/apache-drill-1.1.0.tar.gz
tar xvf apache-drill-1.1.0.tar.gz
Copy the jar files from your functions into the 3rdparty directory in the Drill distro
cp simple-drill-functions/target/*.jar apache-drill-1.1.0/jars/3rdparty
Edit the drill-override.conf
file to add a reference to the package these functions live in:
echo 'drill.logical.function.package+=[com.mapr.drill]' >> apache-drill-1.1.0/conf/drill-override.conf
Now run drill and test the results
$ cd apache-drill-1.1.0/
$ bin/drill-embedded
0: jdbc:drill:zk=local> select myaddints(position_id, 3) from cp.`employee.json` limit 3;
+---------+
| EXPR$0 |
+---------+
| 4.0 |
| 5.0 |
| 5.0 |
+---------+
Zip approximately replicates the python zip function, but only for pairs of lists containing numbers.
If you have data like this:
+------+--------------------+--------------------+--------------------+
| n | t | x | y |
+------+--------------------+--------------------+--------------------+
| 1.0 | [1.0,2.0,3.0] | [3.0,2.0,1.0] | [4.0,5.0,6.0] |
| 2.0 | [0.0,1.0,2.0,3.0] | [0.0,3.0,2.0,1.0] | [7.0,6.0,5.0,4.0] |
+------+--------------------+--------------------+--------------------+
Then a query like this
select n, data[0] as t, data[1] as v
from (
select n, flatten( zip(t,x)) as data
from dfs.root.`/Users/tdunning/tmp/data.json`);
Will produce these results
+------+------+------+
| n | t | v |
+------+------+------+
| 1.0 | 1.0 | 3.0 |
| 1.0 | 2.0 | 2.0 |
| 1.0 | 3.0 | 1.0 |
| 2.0 | 1.0 | 3.0 |
| 2.0 | 2.0 | 2.0 |
| 2.0 | 3.0 | 1.0 |
| 2.0 | 0.0 | 0.0 |
+------+------+------+
Two, three or four arguments can be used. Thus,
select n, data[0] as t, data[1] as x, data[2] as y
from (
select n, flatten( zip(t,x,y)) as data
from dfs.root.`/Users/tdunning/tmp/data.json`);
will produce
+------+------+------+------+
| n | t | x | y |
+------+------+------+------+
| 1.0 | 1.0 | 3.0 | 4.0 |
| 1.0 | 2.0 | 2.0 | 5.0 |
| 1.0 | 3.0 | 1.0 | 6.0 |
| 2.0 | 1.0 | 3.0 | 4.0 |
| 2.0 | 2.0 | 2.0 | 5.0 |
| 2.0 | 3.0 | 1.0 | 6.0 |
| 2.0 | 0.0 | 0.0 | 7.0 |
+------+------+------+------+
Aside from any utility this function itself might have, zip demonstrates how to construct complex output structures.
Mask will replace the n first characters by a masking character as you can see below:
SELECT MASK(first_name, '*', 3) first , MASK(last_name, '#', 10) last FROM cp.`employee.json` LIMIT 2;
+----------+----------+
| first | last |
+----------+----------+
| ***ri | ###### |
| ***rick | ####### |
+----------+----------+