Skip to content

Latest commit

 

History

History
85 lines (62 loc) · 1.78 KB

usddump-sql-tutorial.md

File metadata and controls

85 lines (62 loc) · 1.78 KB
description
In this tutorial, you will see how to use $dump-sql to export data for analytics

🎓 $dump-sql tutorial

{% embed url="https://www.youtube.com/watch?v=BtLxICcQNWw&feature=youtu.be" %}

Create Client & AccessPolicy for your API agent - in our case curl

PUT /
Accept: text/yaml
Content-Type: text/yaml

- resourceType: Client
  id: bulk-client
  secret: secret
  grant_types: ['basic']
- resourceType: AccessPolicy
  id: bulk-client
  engine: allow
  link:
  - {id: 'bulk-client', resourceType: 'Client'}

Generate some number of patients using SQL - in DB Console:

INSERT INTO patient (id,txid, status, resource) 
SELECT g.id, g.id, 'created', '{"name": [{"family": "John"}]}' 
FROM generate_series(1, 100000) AS g (id);
--
SELECT count(*) FROM Patient;

Now we can test bulk export using $dump operation with curl program:

curl -u bulk-client:secret /Patient/\$dump > /tmp/pt.ndjson

>  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
>                                 Dload  Upload   Total   Spent    Left  Speed
> 100 12.0M    0 12.0M    0     0  9527k      0 --:--:--  0:00:01 --:--:-- 9523k

less /tmp/pt.ndjson

We got 100K patients in less than a second!

Do not forget to clean up the database:

TRUNCATE Patient;

Load data into BigQuery

# load tsv data
curl -v -X POST -u bulk-client:secret -H 'content-type:application/json' \
   https://<YOURBOX>/\$dump-sql \
   -d '{"query": "select id, ts, resource#>>'"'"'{module}'"'"' from entity"}' \
   > data.tsv

# create dataset
bq mk test
bq ls

# load datast
bq load --source_format=CSV \
  --field_delimiter=tab \
  test.entities ./data.tsv res,ts,mod

# list ids
bq query 'select id from test.entities'

# remove dataset
bq rm -r test