Skip to content
This repository has been archived by the owner on Sep 4, 2024. It is now read-only.

Describe new facet disable mechanism #361

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 23 additions & 5 deletions docs/client/java/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,30 +14,48 @@ You can make this file available to the client in three ways (the list also pres
2. Place an `openlineage.yml` in the user's current working directory.
3. Place an `openlineage.yml` under `.openlineage/` in the user's home directory (`~/.openlineage/openlineage.yml`).


## Environment Variables

The following environment variables are available:

| Name | Description | Since |
|----------------------|-----------------------------------------------------------------------------|-------|
| OPENLINEAGE_CONFIG | The path to the YAML configuration file. Example: `path/to/openlineage.yml` | |
| OPENLINEAGE_DISABLED | When `true`, OpenLineage will not emit events. | 0.9.0 |


## Facets Configuration

In YAML configuration file you can also specify a list of disabled facets that will not be included in OpenLineage event.
In YAML configuration file you can also disable facets to filter them out from the OpenLineage event.

*YAML Configuration*

```yaml
transport:
type: console
facets:
disabled:
spark_unknown:
disabled: true
spark:
logicalPlan:
disabled: true
```

### Deprecated syntax

The following syntax is deprecated and soon will be removed:

```yaml
transport:
type: console
facets:
disabled:
- spark_unknown
- spark_logicalPlan
- spark.logicalPlan
```

The rationale behind deprecation is that some of the facets were disabled by default in some integrations. When we added
something extra but didn't include the defaults, they were unintentionally enabled.

## Transports

import Transports from './partials/java_transport.md';
Expand Down
32 changes: 17 additions & 15 deletions docs/integrations/spark/configuration/spark_conf.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,18 +6,20 @@ title: Spark Config Parameters

The following parameters can be specified:

| Parameter | Definition | Example |
|----------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------|
| spark.openlineage.transport.type | The transport type used for event emit, default type is `console` | http |
| spark.openlineage.namespace | The default namespace to be applied for any jobs submitted | MyNamespace |
| spark.openlineage.parentJobNamespace | The job namespace to be used for the parent job facet | ParentJobNamespace |
| spark.openlineage.parentJobName | The job name to be used for the parent job facet | ParentJobName |
| spark.openlineage.parentRunId | The RunId of the parent job that initiated this Spark job | xxxx-xxxx-xxxx-xxxx |
| spark.openlineage.appName | Custom value overwriting Spark app name in events | AppName |
| spark.openlineage.facets.disabled | List of facets to disable, enclosed in `[]` (required from 0.21.x) and separated by `;`, default is `[spark_unknown;spark.logicalPlan;]` (currently must contain `;`) | \[spark_unknown;spark.logicalPlan\] |
| spark.openlineage.capturedProperties | comma separated list of properties to be captured in spark properties facet (default `spark.master`, `spark.app.name`) | "spark.example1,spark.example2" |
| spark.openlineage.dataset.removePath.pattern | Java regular expression that removes `?<remove>` named group from dataset path. Can be used to last path subdirectories from paths like `s3://my-whatever-path/year=2023/month=04` | `(.*)(?<remove>\/.*\/.*)` |
| spark.openlineage.jobName.appendDatasetName | Decides whether output dataset name should be appended to job name. By default `true`. | false |
| spark.openlineage.jobName.replaceDotWithUnderscore | Replaces dots in job name with underscore. Can be used to mimic legacy behaviour on Databricks platform. By default `false`. | false |
| spark.openlineage.debugFacet | Determines whether debug facet shall be generated and included within the event. Set `enabled` to turn it on. By default, facet is disabled. | enabled |
| spark.openlineage.job.owners.<ownership-type\> | Specifies ownership of the job. Multiple entries with different types are allowed. Config key name and value are used to create job ownership type and name (available since 1.13). | spark.openlineage.job.owners.team="Some Team" |
| Parameter | Definition | Example |
|------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------|
| spark.openlineage.transport.type | The transport type used for event emit, default type is `console` | http |
| spark.openlineage.namespace | The default namespace to be applied for any jobs submitted | MyNamespace |
| spark.openlineage.parentJobNamespace | The job namespace to be used for the parent job facet | ParentJobNamespace |
| spark.openlineage.parentJobName | The job name to be used for the parent job facet | ParentJobName |
| spark.openlineage.parentRunId | The RunId of the parent job that initiated this Spark job | xxxx-xxxx-xxxx-xxxx |
| spark.openlineage.appName | Custom value overwriting Spark app name in events | AppName |
| spark.openlineage.facets.disabled | **Deprecated: Use the property `spark.openlineage.facets<facet name>.disabled` instead**. List of facets to filter out from the events, enclosed in `[]` (required from 0.21.x) and separated by `;`, default is `[]` | \[columnLineage;\] |
| spark.openlineage.facets.&lt;facet name&gt;.disabled | If set to true, it disables the specific facet. The default value is `false`. The name of the facet can be hierarchical. The facets disabled by default are `debug`, `spark.logicalPlan` and `spark_unknown`. You have to switch the flag to `false` to enable them. | true |
| spark.openlineage.facets.variables | List of environment variables (System.getenv() | \[columnLineage;\] |
| spark.openlineage.capturedProperties | comma separated list of properties to be captured in spark properties facet (default `spark.master`, `spark.app.name`) | "spark.example1,spark.example2" |
| spark.openlineage.dataset.removePath.pattern | Java regular expression that removes `?<remove>` named group from dataset path. Can be used to last path subdirectories from paths like `s3://my-whatever-path/year=2023/month=04` | `(.*)(?<remove>\/.*\/.*)` |
| spark.openlineage.jobName.appendDatasetName | Decides whether output dataset name should be appended to job name. By default `true`. | false |
| spark.openlineage.jobName.replaceDotWithUnderscore | Replaces dots in job name with underscore. Can be used to mimic legacy behaviour on Databricks platform. By default `false`. | false |
| spark.openlineage.debugFacet | Determines whether debug facet shall be generated and included within the event. Set `enabled` to turn it on. By default, facet is disabled. | enabled |
| spark.openlineage.job.owners.<ownership-type\> | Specifies ownership of the job. Multiple entries with different types are allowed. Config key name and value are used to create job ownership type and name (available since 1.13). | spark.openlineage.job.owners.team="Some Team" |
Loading