incorporating feedback from dbt to style and clean the page

wazi55 · Oct 12, 2023 · 2079654 · 2079654
1 parent ab391be
commit 2079654
Show file tree

Hide file tree

Showing 4 changed files with 573 additions and 1,047 deletions.
diff --git a/website/docs/docs/build/python-models.md b/website/docs/docs/build/python-models.md
@@ -651,25 +651,13 @@ If not configured, `dbt-spark` will use the built-in defaults: the all-purpose c
 
 **Submission methods:** The `dbt-bigquery` adapter uses [Dataproc](https://cloud.google.com/dataproc) to submit your Python models as PySpark jobs. Dataproc supports two submission methods: `cluster` and `serverless`.
 
-<File name='dbt_project.yml'>
-
-```yml
-models:
-  config:
-    submission_method: serverless # or cluster
-    # dataproc_cluster_name
-```
-</File>
-
-
-- Cluster Submission Method: Create or use an existing Dataproc Cluster [See example](/reference/resource-configs/bigquery-configs.md#submitting-a-python-model) within dbt_project.yml or yml file within the `models/` directory
-
-- Serverless Submission Method: Dataproc Serverless does not require a ready cluster, but it can also mean the cluster is slower to start. [See example](/reference/resource-configs/bigquery-configs.md#submitting-a-python-model) submitting a job to a serverless cluster in the `.py` file
-
 
-**Installing packages**: If you are using a Dataproc Cluster (as opposed to Dataproc Serverless), you can add third-party packages while creating the cluster with the [Spark BigQuery connector initialization action](https://github.com/GoogleCloudDataproc/initialization-actions/tree/master/connectors#bigquery-connectors). If you are using Dataproc Serverless, you can build your own [custom container image](https://cloud.google.com/dataproc-serverless/docs/guides/custom-containers#python_packages) with the packages you need.
+> |  | `Cluster`  | `Serverless` |
+> | -------------------------- | -------------------------- | -------------------------- |
+> | Submission Method | Create or use an existing Dataproc Cluster, [Submit](/reference/resource-configs/bigquery-configs.md#submitting-a-python-model) a python model within dbt_project.yml or yml file within the `models/` directory | Dataproc Serverless does not require a ready cluster, but it can also mean the cluster is slower to start. [Submit](/reference/resource-configs/bigquery-configs.md#submitting-a-python-model) to a serverless cluster in the `.py` file
+> | Additional Packages | Add third-party packages while creating the cluster with the [Spark BigQuery connector initialization action](https://github.com/GoogleCloudDataproc/initialization-actions/tree/master/connectors#bigquery-connectors). | Build your own [custom container image](https://cloud.google.com/dataproc-serverless/docs/guides/custom-containers#python_packages) with the packages you need |
 
-**Additional setup**: The user or role should have the adequate IAM permission to be able to trigger a job through Dataproc Cluster or Dataproc Serverless
+**Additional setup**: The user or role should have the adequate [IAM permission](/reference/resource-configs/bigquery-configs.md#submitting-a-python-model) to be able to trigger a job through Dataproc Cluster or Dataproc Serverless
 
 **Docs:**
 - [Dataproc overview](https://cloud.google.com/dataproc/docs/concepts/overview)

diff --git a/website/docs/reference/resource-configs/bigquery-configs.md b/website/docs/reference/resource-configs/bigquery-configs.md
@@ -726,17 +726,48 @@ Just like SQL models, there are three ways to configure Python models:
 2. In a dedicated `.yml` file, within the `models/` directory
 3. Within the model's `.py` file, using the `dbt.config()` method
 
-<File name='dbt_project.yml'>
+Any user or service account that runs dbt Python models will need the following permissions(in addition to the required BigQuery permissions) ([docs](https://cloud.google.com/dataproc/docs/concepts/iam/iam)):
+```
+dataproc.batches.create
+dataproc.clusters.use
+dataproc.jobs.create
+dataproc.jobs.get
+dataproc.operations.get
+dataproc.operations.list
+storage.buckets.get
+storage.objects.create
+storage.objects.delete
+```
+
+Set up the profile to include the required parameters including `gcs_bucket` and `dataproc_region`
+<File name='profile.yml'>
+
+```yml
+jaffle_shop:
+  target: dev
+  outputs:
+    dev:
+      type: bigquery
+      method: oauth
+      project: <your_project>
+      dataset: <your_dataset> 
+      gcs_bucket: <your_bucket> # required for python models
+      dataproc_region: <your_region> # required for python models
+      threads: 4
+```
+</File>
+
+Then based on the submission method, you can configure the model in `dbt_project.yml` or `models/<modelname>.yml` or within the model's `.py` file.
+
+<File name='models.yml'>
 
 ```yml
-# dbt_project.yml with a python model submitting jobs against a dataproc cluster
+# models.yml with a python model submitting jobs against a dataproc cluster
 models:
   - name: my_python_model
     config:
       submission_method: cluster
-      dataproc_cluster_name: my-favorite-cluster
-      dataproc_region: us-central1
-      gcs_bucket: my-favorite-bucket
+      dataproc_cluster_name: my-favorite-cluster # Need to supply dataproc_cluster_name in profile or config to submit python job with cluster submission method
 ```
 
 </File>
@@ -747,9 +778,7 @@ models:
 
 def model(dbt, session):
     dbt.config(
-        submission_method="serverless",
-        dataproc_region="us-central1",
-        gcs_bucket="my-favorite-bucket"
+        submission_method="serverless"
     )
     ...