Skip to content
This repository has been archived by the owner on Sep 4, 2024. It is now read-only.

Commit

Permalink
fix: Wrong default value of disabled facets property (#359)
Browse files Browse the repository at this point in the history
Signed-off-by: Artur Owczarek <[email protected]>
  • Loading branch information
arturowczarek authored Jul 26, 2024
1 parent 6a034f5 commit 5c26c35
Show file tree
Hide file tree
Showing 2 changed files with 26 additions and 28 deletions.
24 changes: 11 additions & 13 deletions docs/integrations/flink.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ whether the job runs properly.

## Limitations

Currently OpenLineage's Flink integration is limited to getting information from jobs running in Application Mode.
Currently, OpenLineage's Flink integration is limited to getting information from jobs running in Application Mode.

OpenLineage integration extracts lineage only from following `Sources` and `Sinks`:

Expand Down Expand Up @@ -76,10 +76,10 @@ In your job, you need to set up `OpenLineageFlinkJobListener`.

For example:
```java
JobListener listener = JobListener listener = OpenLineageFlinkJobListener.builder()
.executionEnvironment(streamExecutionEnvironment)
.build();
streamExecutionEnvironment.registerJobListener(listener);
JobListener listener = OpenLineageFlinkJobListener.builder()
.executionEnvironment(streamExecutionEnvironment)
.build();
streamExecutionEnvironment.registerJobListener(listener);
```

Also, OpenLineage needs certain parameters to be set in `flink-conf.yaml`:
Expand Down Expand Up @@ -114,17 +114,15 @@ and allows all the configuration features present there to be used. The configur
* `openlineage.yml` file with a environment property `OPENLINEAGE_CONFIG` being set and pointing to configuration file. File structure and allowed options are described [here](https://github.com/OpenLineage/OpenLineage/tree/main/client/java#configuration).
* Standard Flink configuration with the parameters defined below.

### Flink Configuration parameters
### Flink Configuration parameters

The following parameters can be specified:

| Parameter | Definition | Example |
------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------
| openlineage.transport.type | The transport type used for event emit, default type is `console` | http |
| openlineage.facets.disabled | List of facets to disable, enclosed in `[]` (required from 0.21.x) and separated by `;` | \[some_facet1;some_facet1\] |
| openlineage.job.owners.<ownership-type\> | Specifies ownership of the job. Multiple entries with different types are allowed. Config key name and value are used to create job ownership type and name (available since 1.13). | openlineage.job.owners.team="Some Team" |


| Parameter | Definition | Example |
|------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------|
| openlineage.transport.type | The transport type used for event emit, default type is `console` | http |
| openlineage.facets.disabled | List of facets to disable, enclosed in `[]` (required from 0.21.x) and separated by `;`, default is `[spark_unknown;spark.logicalPlan;]` (currently must contain `;`) | \[some_facet1;some_facet1\] |
| openlineage.job.owners.<ownership-type\> | Specifies ownership of the job. Multiple entries with different types are allowed. Config key name and value are used to create job ownership type and name (available since 1.13). | openlineage.job.owners.team="Some Team" |

## Transports

Expand Down
30 changes: 15 additions & 15 deletions docs/integrations/spark/configuration/spark_conf.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,18 +6,18 @@ title: Spark Config Parameters

The following parameters can be specified:

| Parameter | Definition | Example |
----------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------
| spark.openlineage.transport.type | The transport type used for event emit, default type is `console` | http |
| spark.openlineage.namespace | The default namespace to be applied for any jobs submitted | MyNamespace |
| spark.openlineage.parentJobNamespace | The job namespace to be used for the parent job facet | ParentJobNamespace |
| spark.openlineage.parentJobName | The job name to be used for the parent job facet | ParentJobName |
| spark.openlineage.parentRunId | The RunId of the parent job that initiated this Spark job | xxxx-xxxx-xxxx-xxxx |
| spark.openlineage.appName | Custom value overwriting Spark app name in events | AppName |
| spark.openlineage.facets.disabled | List of facets to disable, enclosed in `[]` (required from 0.21.x) and separated by `;`, default is `[spark_unknown;]` (currently must contain `;`) | \[spark_unknown;spark.logicalPlan\] |
| spark.openlineage.capturedProperties | comma separated list of properties to be captured in spark properties facet (default `spark.master`, `spark.app.name`) | "spark.example1,spark.example2" |
| spark.openlineage.dataset.removePath.pattern | Java regular expression that removes `?<remove>` named group from dataset path. Can be used to last path subdirectories from paths like `s3://my-whatever-path/year=2023/month=04` | `(.*)(?<remove>\/.*\/.*)` |
| spark.openlineage.jobName.appendDatasetName | Decides whether output dataset name should be appended to job name. By default `true`. | false |
| spark.openlineage.jobName.replaceDotWithUnderscore | Replaces dots in job name with underscore. Can be used to mimic legacy behaviour on Databricks platform. By default `false`. | false |
| spark.openlineage.debugFacet | Determines whether debug facet shall be generated and included within the event. Set `enabled` to turn it on. By default, facet is disabled. | enabled |
| spark.openlineage.job.owners.<ownership-type\> | Specifies ownership of the job. Multiple entries with different types are allowed. Config key name and value are used to create job ownership type and name (available since 1.13). | spark.openlineage.job.owners.team="Some Team" |
| Parameter | Definition | Example |
|----------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------|
| spark.openlineage.transport.type | The transport type used for event emit, default type is `console` | http |
| spark.openlineage.namespace | The default namespace to be applied for any jobs submitted | MyNamespace |
| spark.openlineage.parentJobNamespace | The job namespace to be used for the parent job facet | ParentJobNamespace |
| spark.openlineage.parentJobName | The job name to be used for the parent job facet | ParentJobName |
| spark.openlineage.parentRunId | The RunId of the parent job that initiated this Spark job | xxxx-xxxx-xxxx-xxxx |
| spark.openlineage.appName | Custom value overwriting Spark app name in events | AppName |
| spark.openlineage.facets.disabled | List of facets to disable, enclosed in `[]` (required from 0.21.x) and separated by `;`, default is `[spark_unknown;spark.logicalPlan;]` (currently must contain `;`) | \[spark_unknown;spark.logicalPlan\] |
| spark.openlineage.capturedProperties | comma separated list of properties to be captured in spark properties facet (default `spark.master`, `spark.app.name`) | "spark.example1,spark.example2" |
| spark.openlineage.dataset.removePath.pattern | Java regular expression that removes `?<remove>` named group from dataset path. Can be used to last path subdirectories from paths like `s3://my-whatever-path/year=2023/month=04` | `(.*)(?<remove>\/.*\/.*)` |
| spark.openlineage.jobName.appendDatasetName | Decides whether output dataset name should be appended to job name. By default `true`. | false |
| spark.openlineage.jobName.replaceDotWithUnderscore | Replaces dots in job name with underscore. Can be used to mimic legacy behaviour on Databricks platform. By default `false`. | false |
| spark.openlineage.debugFacet | Determines whether debug facet shall be generated and included within the event. Set `enabled` to turn it on. By default, facet is disabled. | enabled |
| spark.openlineage.job.owners.<ownership-type\> | Specifies ownership of the job. Multiple entries with different types are allowed. Config key name and value are used to create job ownership type and name (available since 1.13). | spark.openlineage.job.owners.team="Some Team" |

0 comments on commit 5c26c35

Please sign in to comment.