Skip to content

Commit

Permalink
Merge branch 'main' into main
Browse files Browse the repository at this point in the history
  • Loading branch information
francescomucio authored Jun 13, 2022
2 parents 98ea5e3 + ca1b5b6 commit 6bb2f82
Show file tree
Hide file tree
Showing 6 changed files with 123 additions and 5 deletions.
101 changes: 101 additions & 0 deletions CONTRIBUTING.MD
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
# Contributing to `dbt-spark`

1. [About this document](#about-this-document)
3. [Getting the code](#getting-the-code)
5. [Running `dbt-spark` in development](#running-dbt-spark-in-development)
6. [Testing](#testing)
7. [Updating Docs](#updating-docs)
7. [Submitting a Pull Request](#submitting-a-pull-request)

## About this document
This document is a guide intended for folks interested in contributing to `dbt-spark`. Below, we document the process by which members of the community should create issues and submit pull requests (PRs) in this repository. It is not intended as a guide for using `dbt-spark`, and it assumes a certain level of familiarity with Python concepts such as virtualenvs, `pip`, Python modules, and so on. This guide assumes you are using macOS or Linux and are comfortable with the command line.

For those wishing to contribute we highly suggest reading the dbt-core's [contribution guide](https://github.com/dbt-labs/dbt-core/blob/HEAD/CONTRIBUTING.md) if you haven't already. Almost all of the information there is applicable to contributing here, too!

### Signing the CLA

Please note that all contributors to `dbt-spark` must sign the [Contributor License Agreement](https://docs.getdbt.com/docs/contributor-license-agreements) to have their Pull Request merged into an `dbt-spark` codebase. If you are unable to sign the CLA, then the `dbt-spark` maintainers will unfortunately be unable to merge your Pull Request. You are, however, welcome to open issues and comment on existing ones.


## Getting the code

You will need `git` in order to download and modify the `dbt-spark` source code. You can find directions [here](https://github.com/git-guides/install-git) on how to install `git`.

### External contributors

If you are not a member of the `dbt-labs` GitHub organization, you can contribute to `dbt-spark` by forking the `dbt-spark` repository. For a detailed overview on forking, check out the [GitHub docs on forking](https://help.github.com/en/articles/fork-a-repo). In short, you will need to:

1. fork the `dbt-spark` repository
2. clone your fork locally
3. check out a new branch for your proposed changes
4. push changes to your fork
5. open a pull request against `dbt-labs/dbt-spark` from your forked repository

### dbt Labs contributors

If you are a member of the `dbt Labs` GitHub organization, you will have push access to the `dbt-spark` repo. Rather than forking `dbt-spark` to make your changes, just clone the repository, check out a new branch, and push directly to that branch.


## Running `dbt-spark` in development

### Installation

First make sure that you set up your `virtualenv` as described in [Setting up an environment](https://github.com/dbt-labs/dbt-core/blob/HEAD/CONTRIBUTING.md#setting-up-an-environment). Ensure you have the latest version of pip installed with `pip install --upgrade pip`. Next, install `dbt-spark` latest dependencies:

```sh
pip install -e . -r dev-requirements.txt
```

When `dbt-spark` is installed this way, any changes you make to the `dbt-spark` source code will be reflected immediately in your next `dbt-spark` run.

To confirm you have correct version of `dbt-core` installed please run `dbt --version` and `which dbt`.


## Testing

### Initial Setup

`dbt-spark` uses test credentials specified in a `test.env` file in the root of the repository. This `test.env` file is git-ignored, but please be _extra_ careful to never check in credentials or other sensitive information when developing. To create your `test.env` file, copy the provided example file, then supply your relevant credentials.

```
cp test.env.example test.env
$EDITOR test.env
```

### Test commands
There are a few methods for running tests locally.

#### `tox`
`tox` takes care of managing Python virtualenvs and installing dependencies in order to run tests. You can also run tests in parallel, for example you can run unit tests for Python 3.7, Python 3.8, Python 3.9, and `flake8` checks in parallel with `tox -p`. Also, you can run unit tests for specific python versions with `tox -e py37`. The configuration of these tests are located in `tox.ini`.

#### `pytest`
Finally, you can also run a specific test or group of tests using `pytest` directly. With a Python virtualenv active and dev dependencies installed you can do things like:

```sh
# run specific spark integration tests
python -m pytest -m profile_spark tests/integration/get_columns_in_relation
# run specific functional tests
python -m pytest --profile databricks_sql_endpoint tests/functional/adapter/test_basic.py
# run all unit tests in a file
python -m pytest tests/unit/test_adapter.py
# run a specific unit test
python -m pytest test/unit/test_adapter.py::TestSparkAdapter::test_profile_with_database
```
## Updating Docs

Many changes will require and update to the `dbt-spark` docs here are some useful resources.

- Docs are [here](https://docs.getdbt.com/).
- The docs repo for making changes is located [here]( https://github.com/dbt-labs/docs.getdbt.com).
- The changes made are likely to impact one or both of [Spark Profile](https://docs.getdbt.com/reference/warehouse-profiles/spark-profile), or [Saprk Configs](https://docs.getdbt.com/reference/resource-configs/spark-configs).
- We ask every community member who makes a user-facing change to open an issue or PR regarding doc changes.

## Submitting a Pull Request

dbt Labs provides a CI environment to test changes to the `dbt-spark` adapter, and periodic checks against the development version of `dbt-core` through Github Actions.

A `dbt-spark` maintainer will review your PR. They may suggest code revision for style or clarity, or request that you add unit or integration test(s). These are good things! We believe that, with a little bit of help, anyone can contribute high-quality code.

Once all requests and answers have been answered the `dbt-spark` maintainer can trigger CI testing.

Once all tests are passing and your PR has been approved, a `dbt-spark` maintainer will merge your changes into the active development branch. And that's it! Happy developing :tada:
8 changes: 5 additions & 3 deletions dbt/adapters/spark/connections.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
import os

from contextlib import contextmanager

import dbt.exceptions
Expand All @@ -7,6 +9,7 @@
from dbt.events import AdapterLogger
from dbt.utils import DECIMALS
from dbt.adapters.spark import __version__
from dbt.tracking import DBT_INVOCATION_ENV

try:
from TCLIService.ttypes import TOperationState as ThriftState
Expand Down Expand Up @@ -409,9 +412,8 @@ def open(cls, connection):
cls.validate_creds(creds, required_fields)

dbt_spark_version = __version__.version
user_agent_entry = (
f"dbt-labs-dbt-spark/{dbt_spark_version} (Databricks)" # noqa
)
dbt_invocation_env = os.getenv(DBT_INVOCATION_ENV) or "manual"
user_agent_entry = f"dbt-labs-dbt-spark/{dbt_spark_version} (Databricks, {dbt_invocation_env})" # noqa

# http://simba.wpengine.com/products/Spark/doc/ODBC_InstallGuide/unix/content/odbc/hi/configuring/serverside.htm
ssp = {f"SSP_{k}": f"{{{v}}}" for k, v in creds.server_side_parameters.items()}
Expand Down
3 changes: 2 additions & 1 deletion dbt/adapters/spark/impl.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,8 @@

logger = AdapterLogger("Spark")

GET_COLUMNS_IN_RELATION_RAW_MACRO_NAME = "spark__get_columns_in_relation_raw"

GET_COLUMNS_IN_RELATION_RAW_MACRO_NAME = "get_columns_in_relation_raw"
LIST_SCHEMAS_MACRO_NAME = "list_schemas"
LIST_RELATIONS_MACRO_NAME = "list_relations_without_caching"
DROP_RELATION_MACRO_NAME = "drop_relation"
Expand Down
4 changes: 4 additions & 0 deletions dbt/include/spark/macros/adapters.sql
Original file line number Diff line number Diff line change
Expand Up @@ -168,6 +168,10 @@
{%- endcall -%}
{% endmacro %}

{% macro get_columns_in_relation_raw(relation) -%}
{{ return(adapter.dispatch('get_columns_in_relation_raw', 'dbt')(relation)) }}
{%- endmacro -%}

{% macro spark__get_columns_in_relation_raw(relation) -%}
{% call statement('get_columns_in_relation_raw', fetch_result=True) %}
describe extended {{ relation.include(schema=(schema is not none)) }}
Expand Down
10 changes: 10 additions & 0 deletions test.env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Cluster ID
DBT_DATABRICKS_CLUSTER_NAME=
# SQL Endpoint
DBT_DATABRICKS_ENDPOINT=
# Server Hostname value
DBT_DATABRICKS_HOST_NAME=
# personal token
DBT_DATABRICKS_TOKEN=
# file path to local ODBC driver
ODBC_DRIVER=
2 changes: 1 addition & 1 deletion tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ def pytest_addoption(parser):
parser.addoption("--profile", action="store", default="apache_spark", type=str)


# Using @pytest.mark.skip_adapter('apache_spark') uses the 'skip_by_adapter_type'
# Using @pytest.mark.skip_profile('apache_spark') uses the 'skip_by_profile_type'
# autouse fixture below
def pytest_configure(config):
config.addinivalue_line(
Expand Down

0 comments on commit 6bb2f82

Please sign in to comment.