Skip to content

Latest commit

 

History

History
54 lines (36 loc) · 1.92 KB

README.md

File metadata and controls

54 lines (36 loc) · 1.92 KB

dbt-spark-livy

The dbt-spark-livy adapter allows you to use dbt along with Apache spark-livy and Cloudera Data Platform with Livy server support. This code bases use the dbt-spark project (https://github.com/dbt-labs/dbt-spark), and provides a Livy connectivity support over it.

Getting started

Running locally

A docker-compose environment starts a Spark Thrift server and a Postgres database as a Hive Metastore backend. Note: dbt-spark now supports Spark 3.1.1 (formerly on Spark 2.x).

Python >= 3.8

dbt-core ~= 1.3.0

pyspark

sqlparams

requests_kerberos

requests-toolbelt

python-decouple

Installing dbt-spark-livy

pip install dbt-spark-livy

Profile Setup

demo_project:
  target: dev
  outputs:
    dev:
     type: spark_livy
     method: livy
     schema: my_db
     host: https://spark-livy-gateway.my.org.com/dbt-spark/cdp-proxy-api/livy_for_spark3/
     user: my_user
     password: my_pass

Caveats

  • While using livy , in the Livy UI if you notice sessions change state to dead from starting instead of idle, make sure there is a proper mapping for the user in the IDBroker mapping section
  • Actions > Manage Access > IDBroker Mappings . Reference
  • Also make sure the workload password is set either through UI or CLI. Reference

Supported features

Please see the original adapter documentation: https://github.com/dbt-labs/dbt-spark and https://docs.getdbt.com/reference/warehouse-profiles/spark-profile