diff --git a/docs/client/java/configuration.md b/docs/client/java/configuration.md index c7cd689a65..1b63eb0a6b 100644 --- a/docs/client/java/configuration.md +++ b/docs/client/java/configuration.md @@ -14,8 +14,8 @@ You can make this file available to the client in three ways (the list also pres 2. Place an `openlineage.yml` in the user's current working directory. 3. Place an `openlineage.yml` under `.openlineage/` in the user's home directory (`~/.openlineage/openlineage.yml`). - ## Environment Variables + The following environment variables are available: | Name | Description | Since | @@ -23,21 +23,39 @@ The following environment variables are available: | OPENLINEAGE_CONFIG | The path to the YAML configuration file. Example: `path/to/openlineage.yml` | | | OPENLINEAGE_DISABLED | When `true`, OpenLineage will not emit events. | 0.9.0 | - ## Facets Configuration -In YAML configuration file you can also specify a list of disabled facets that will not be included in OpenLineage event. +In YAML configuration file you can also disable facets to filter them out from the OpenLineage event. *YAML Configuration* + ```yaml transport: type: console facets: - disabled: + spark_unknown: + disabled: true + spark: + logicalPlan: + disabled: true +``` + +### Deprecated syntax + +The following syntax is deprecated and soon will be removed: + +```yaml +transport: + type: console +facets: + disabled: - spark_unknown - - spark_logicalPlan + - spark.logicalPlan ``` +The rationale behind deprecation is that some of the facets were disabled by default in some integrations. When we added +something extra but didn't include the defaults, they were unintentionally enabled. + ## Transports import Transports from './partials/java_transport.md'; diff --git a/docs/integrations/spark/configuration/spark_conf.md b/docs/integrations/spark/configuration/spark_conf.md index a00748b32e..1031deaa1a 100644 --- a/docs/integrations/spark/configuration/spark_conf.md +++ b/docs/integrations/spark/configuration/spark_conf.md @@ -6,18 +6,20 @@ title: Spark Config Parameters The following parameters can be specified: -| Parameter | Definition | Example | -|----------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------| -| spark.openlineage.transport.type | The transport type used for event emit, default type is `console` | http | -| spark.openlineage.namespace | The default namespace to be applied for any jobs submitted | MyNamespace | -| spark.openlineage.parentJobNamespace | The job namespace to be used for the parent job facet | ParentJobNamespace | -| spark.openlineage.parentJobName | The job name to be used for the parent job facet | ParentJobName | -| spark.openlineage.parentRunId | The RunId of the parent job that initiated this Spark job | xxxx-xxxx-xxxx-xxxx | -| spark.openlineage.appName | Custom value overwriting Spark app name in events | AppName | -| spark.openlineage.facets.disabled | List of facets to disable, enclosed in `[]` (required from 0.21.x) and separated by `;`, default is `[spark_unknown;spark.logicalPlan;]` (currently must contain `;`) | \[spark_unknown;spark.logicalPlan\] | -| spark.openlineage.capturedProperties | comma separated list of properties to be captured in spark properties facet (default `spark.master`, `spark.app.name`) | "spark.example1,spark.example2" | -| spark.openlineage.dataset.removePath.pattern | Java regular expression that removes `?` named group from dataset path. Can be used to last path subdirectories from paths like `s3://my-whatever-path/year=2023/month=04` | `(.*)(?\/.*\/.*)` | -| spark.openlineage.jobName.appendDatasetName | Decides whether output dataset name should be appended to job name. By default `true`. | false | -| spark.openlineage.jobName.replaceDotWithUnderscore | Replaces dots in job name with underscore. Can be used to mimic legacy behaviour on Databricks platform. By default `false`. | false | -| spark.openlineage.debugFacet | Determines whether debug facet shall be generated and included within the event. Set `enabled` to turn it on. By default, facet is disabled. | enabled | -| spark.openlineage.job.owners. | Specifies ownership of the job. Multiple entries with different types are allowed. Config key name and value are used to create job ownership type and name (available since 1.13). | spark.openlineage.job.owners.team="Some Team" | +| Parameter | Definition | Example | +|------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------| +| spark.openlineage.transport.type | The transport type used for event emit, default type is `console` | http | +| spark.openlineage.namespace | The default namespace to be applied for any jobs submitted | MyNamespace | +| spark.openlineage.parentJobNamespace | The job namespace to be used for the parent job facet | ParentJobNamespace | +| spark.openlineage.parentJobName | The job name to be used for the parent job facet | ParentJobName | +| spark.openlineage.parentRunId | The RunId of the parent job that initiated this Spark job | xxxx-xxxx-xxxx-xxxx | +| spark.openlineage.appName | Custom value overwriting Spark app name in events | AppName | +| spark.openlineage.facets.disabled | **Deprecated: Use the property `spark.openlineage.facets.disabled` instead**. List of facets to filter out from the events, enclosed in `[]` (required from 0.21.x) and separated by `;`, default is `[]` | \[columnLineage;\] | +| spark.openlineage.facets.<facet name>.disabled | If set to true, it disables the specific facet. The default value is `false`. The name of the facet can be hierarchical. The facets disabled by default are `debug`, `spark.logicalPlan` and `spark_unknown`. You have to switch the flag to `false` to enable them. | true | +| spark.openlineage.facets.variables | List of environment variables (System.getenv() | \[columnLineage;\] | +| spark.openlineage.capturedProperties | comma separated list of properties to be captured in spark properties facet (default `spark.master`, `spark.app.name`) | "spark.example1,spark.example2" | +| spark.openlineage.dataset.removePath.pattern | Java regular expression that removes `?` named group from dataset path. Can be used to last path subdirectories from paths like `s3://my-whatever-path/year=2023/month=04` | `(.*)(?\/.*\/.*)` | +| spark.openlineage.jobName.appendDatasetName | Decides whether output dataset name should be appended to job name. By default `true`. | false | +| spark.openlineage.jobName.replaceDotWithUnderscore | Replaces dots in job name with underscore. Can be used to mimic legacy behaviour on Databricks platform. By default `false`. | false | +| spark.openlineage.debugFacet | Determines whether debug facet shall be generated and included within the event. Set `enabled` to turn it on. By default, facet is disabled. | enabled | +| spark.openlineage.job.owners. | Specifies ownership of the job. Multiple entries with different types are allowed. Config key name and value are used to create job ownership type and name (available since 1.13). | spark.openlineage.job.owners.team="Some Team" |