-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Signed-off-by: Pawel Leszczynski <[email protected]>
- Loading branch information
1 parent
34c4f4f
commit b8ab2a9
Showing
232 changed files
with
12,547 additions
and
0 deletions.
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
{ | ||
"label": "Client Libraries", | ||
"position": 4 | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
{ | ||
"label": "Java", | ||
"position": 1 | ||
} |
112 changes: 112 additions & 0 deletions
112
versioned_docs/version-1.22.0/client/java/configuration.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,112 @@ | ||
--- | ||
sidebar_position: 2 | ||
title: Configuration | ||
--- | ||
|
||
We recommend configuring the client with an `openlineage.yml` file that contains all the | ||
details of how to connect to your OpenLineage backend. | ||
|
||
See [example configurations.](#transports) | ||
|
||
You can make this file available to the client in three ways (the list also presents precedence of the configuration): | ||
|
||
1. Set an `OPENLINEAGE_CONFIG` environment variable to a file path: `OPENLINEAGE_CONFIG=path/to/openlineage.yml`. | ||
2. Place an `openlineage.yml` in the user's current working directory. | ||
3. Place an `openlineage.yml` under `.openlineage/` in the user's home directory (`~/.openlineage/openlineage.yml`). | ||
|
||
## Environment Variables | ||
|
||
The following environment variables are available: | ||
|
||
| Name | Description | Since | | ||
|----------------------|-----------------------------------------------------------------------------|-------| | ||
| OPENLINEAGE_CONFIG | The path to the YAML configuration file. Example: `path/to/openlineage.yml` | | | ||
| OPENLINEAGE_DISABLED | When `true`, OpenLineage will not emit events. | 0.9.0 | | ||
|
||
## Facets Configuration | ||
|
||
In YAML configuration file you can also disable facets to filter them out from the OpenLineage event. | ||
|
||
*YAML Configuration* | ||
|
||
```yaml | ||
transport: | ||
type: console | ||
facets: | ||
spark_unknown: | ||
disabled: true | ||
spark: | ||
logicalPlan: | ||
disabled: true | ||
``` | ||
### Deprecated syntax | ||
The following syntax is deprecated and soon will be removed: | ||
```yaml | ||
transport: | ||
type: console | ||
facets: | ||
disabled: | ||
- spark_unknown | ||
- spark.logicalPlan | ||
``` | ||
The rationale behind deprecation is that some of the facets were disabled by default in some integrations. When we added | ||
something extra but didn't include the defaults, they were unintentionally enabled. | ||
## Transports | ||
import Transports from './partials/java_transport.md'; | ||
<Transports/> | ||
### Error Handling via Transport | ||
```java | ||
// Connect to http://localhost:5000 | ||
OpenLineageClient client = OpenLineageClient.builder() | ||
.transport( | ||
HttpTransport.builder() | ||
.uri("http://localhost:5000") | ||
.apiKey("f38d2189-c603-4b46-bdea-e573a3b5a7d5") | ||
.build()) | ||
.registerErrorHandler(new EmitErrorHandler() { | ||
@Override | ||
public void handleError(Throwable throwable) { | ||
// Handle emit error here | ||
} | ||
}).build(); | ||
``` | ||
|
||
### Defining Your Own Transport | ||
|
||
```java | ||
OpenLineageClient client = OpenLineageClient.builder() | ||
.transport( | ||
new MyTransport() { | ||
@Override | ||
public void emit(OpenLineage.RunEvent runEvent) { | ||
// Add emit logic here | ||
} | ||
}).build(); | ||
``` | ||
|
||
## Circuit Breakers | ||
|
||
import CircuitBreakers from './partials/java_circuit_breaker.md'; | ||
|
||
<CircuitBreakers/> | ||
|
||
## Metrics | ||
|
||
import Metrics from './partials/java_metrics.md'; | ||
|
||
<Metrics/> | ||
|
||
## Dataset Namespace Resolver | ||
|
||
import DatasetNamespaceResolver from './partials/java_namespace_resolver.md'; | ||
|
||
<DatasetNamespaceResolver/> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
--- | ||
sidebar_position: 5 | ||
--- | ||
|
||
# Java | ||
|
||
## Overview | ||
|
||
The OpenLineage Java is a SDK for Java programming language that users can use to generate and emit OpenLineage events to OpenLineage backends. | ||
The core data structures currently offered by the client are the `RunEvent`, `RunState`, `Run`, `Job`, `Dataset`, | ||
and `Transport` classes, along with various `Facets` that can come under run, job, and dataset. | ||
|
||
There are various [transport classes](#transports) that the library provides that carry the lineage events into various target endpoints (e.g. HTTP). | ||
|
||
You can also use the Java client to create your own custom integrations. | ||
|
||
## Installation | ||
|
||
Java client is provided as library that can either be imported into your Java project using Maven or Gradle. | ||
|
||
Maven: | ||
|
||
```xml | ||
<dependency> | ||
<groupId>io.openlineage</groupId> | ||
<artifactId>openlineage-java</artifactId> | ||
<version>${OPENLINEAGE_VERSION}</version> | ||
</dependency> | ||
``` | ||
|
||
or Gradle: | ||
|
||
```groovy | ||
implementation("io.openlineage:openlineage-java:${OPENLINEAGE_VERSION}") | ||
``` | ||
|
||
For more information on the available versions of the `openlineage-java`, | ||
please refer to the [maven repository](https://search.maven.org/artifact/io.openlineage/openlineage-java). | ||
|
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
107 changes: 107 additions & 0 deletions
107
versioned_docs/version-1.22.0/client/java/partials/java_circuit_breaker.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,107 @@ | ||
import Tabs from '@theme/Tabs'; | ||
import TabItem from '@theme/TabItem'; | ||
|
||
:::info | ||
This feature is available in OpenLineage versions >= 1.9.0. | ||
::: | ||
|
||
To prevent from over-instrumentation OpenLineage integration provides a circuit breaker mechanism | ||
that stops OpenLineage from creating, serializing and sending OpenLineage events. | ||
|
||
### Simple Memory Circuit Breaker | ||
|
||
Simple circuit breaker which is working based only on free memory within JVM. Configuration should | ||
contain free memory threshold limit (percentage). Default value is `20%`. The circuit breaker | ||
will close within first call if free memory is low. `circuitCheckIntervalInMillis` parameter is used | ||
to configure a frequency circuit breaker is called. Default value is `1000ms`, when no entry in config. | ||
`timeoutInSeconds` is optional. If set, OpenLineage code execution is terminated when a timeout | ||
is reached (added in version 1.13). | ||
|
||
<Tabs groupId="integrations"> | ||
<TabItem value="yaml" label="Yaml Config"> | ||
|
||
```yaml | ||
circuitBreaker: | ||
type: simpleMemory | ||
memoryThreshold: 20 | ||
circuitCheckIntervalInMillis: 1000 | ||
timeoutInSeconds: 90 | ||
``` | ||
</TabItem> | ||
<TabItem value="spark" label="Spark Config"> | ||
| Parameter | Definition | Example | | ||
--------------------------------------|----------------------------------------------------------------|-------------- | ||
| spark.openlineage.circuitBreaker.type | Circuit breaker type selected | simpleMemory | | ||
| spark.openlineage.circuitBreaker.memoryThreshold | Memory threshold | 20 | | ||
| spark.openlineage.circuitBreaker.circuitCheckIntervalInMillis | Frequency of checking circuit breaker | 1000 | | ||
| spark.openlineage.circuitBreaker.timeoutInSeconds | Optional timeout for OpenLineage execution (Since version 1.13)| 90 | | ||
</TabItem> | ||
<TabItem value="flink" label="Flink Config"> | ||
| Parameter | Definition | Example | | ||
--------------------------------------|---------------------------------------------|------------- | ||
| openlineage.circuitBreaker.type | Circuit breaker type selected | simpleMemory | | ||
| openlineage.circuitBreaker.memoryThreshold | Memory threshold | 20 | | ||
| openlineage.circuitBreaker.circuitCheckIntervalInMillis | Frequency of checking circuit breaker | 1000 | | ||
| spark.openlineage.circuitBreaker.timeoutInSeconds | Optional timeout for OpenLineage execution (Since version 1.13) | 90 | | ||
</TabItem> | ||
</Tabs> | ||
### Java Runtime Circuit Breaker | ||
More complex version of circuit breaker. The amount of free memory can be low as long as | ||
amount of time spent on Garbage Collection is acceptable. `JavaRuntimeCircuitBreaker` closes | ||
when free memory drops below threshold and amount of time spent on garbage collection exceeds | ||
given threshold (`10%` by default). The circuit breaker is always open when checked for the first time | ||
as GC threshold is computed since the previous circuit breaker call. | ||
`circuitCheckIntervalInMillis` parameter is used | ||
to configure a frequency circuit breaker is called. | ||
Default value is `1000ms`, when no entry in config. | ||
`timeoutInSeconds` is optional. If set, OpenLineage code execution is terminated when a timeout | ||
is reached (added in version 1.13). | ||
|
||
<Tabs groupId="integrations"> | ||
<TabItem value="yaml" label="Yaml Config"> | ||
|
||
```yaml | ||
circuitBreaker: | ||
type: javaRuntime | ||
memoryThreshold: 20 | ||
gcCpuThreshold: 10 | ||
circuitCheckIntervalInMillis: 1000 | ||
timeoutInSeconds: 90 | ||
``` | ||
</TabItem> | ||
<TabItem value="spark" label="Spark Config"> | ||
|
||
| Parameter | Definition | Example | | ||
--------------------------------------|---------------------------------------|------------- | ||
| spark.openlineage.circuitBreaker.type | Circuit breaker type selected | javaRuntime | | ||
| spark.openlineage.circuitBreaker.memoryThreshold | Memory threshold | 20 | | ||
| spark.openlineage.circuitBreaker.gcCpuThreshold | Garbage Collection CPU threshold | 10 | | ||
| spark.openlineage.circuitBreaker.circuitCheckIntervalInMillis | Frequency of checking circuit breaker | 1000 | | ||
| spark.openlineage.circuitBreaker.timeoutInSeconds | Optional timeout for OpenLineage execution (Since version 1.13)| 90 | | ||
|
||
|
||
</TabItem> | ||
<TabItem value="flink" label="Flink Config"> | ||
|
||
| Parameter | Definition | Example | | ||
--------------------------------------|---------------------------------------|------------- | ||
| openlineage.circuitBreaker.type | Circuit breaker type selected | javaRuntime | | ||
| openlineage.circuitBreaker.memoryThreshold | Memory threshold | 20 | | ||
| openlineage.circuitBreaker.gcCpuThreshold | Garbage Collection CPU threshold | 10 | | ||
| openlineage.circuitBreaker.circuitCheckIntervalInMillis | Frequency of checking circuit breaker | 1000 | | ||
| spark.openlineage.circuitBreaker.timeoutInSeconds | Optional timeout for OpenLineage execution (Since version 1.13) | 90 | | ||
|
||
|
||
</TabItem> | ||
</Tabs> | ||
|
||
### Custom Circuit Breaker | ||
|
||
List of available circuit breakers can be extended with custom one loaded via ServiceLoader | ||
with own implementation of `io.openlineage.client.circuitBreaker.CircuitBreakerBuilder`. |
64 changes: 64 additions & 0 deletions
64
versioned_docs/version-1.22.0/client/java/partials/java_metrics.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,64 @@ | ||
import Tabs from '@theme/Tabs'; | ||
import TabItem from '@theme/TabItem'; | ||
|
||
:::info | ||
This feature is available in OpenLineage 1.11 and above | ||
::: | ||
|
||
To ease the operational experience of using the OpenLineage integrations, this document details the metrics collected by the Java client and the configuration settings for various metric backends. | ||
|
||
### Metrics collected by Java Client | ||
|
||
The following table outlines the metrics collected by the OpenLineage Java client, which help in monitoring the integration's performance: | ||
|
||
| Metric | Definition | Type | | ||
|-------------------------------------|-------------------------------------------------------|--------| | ||
| `openlineage.emit.start` | Number of events the integration started to send | Counter| | ||
| `openlineage.emit.complete` | Number of events the integration completed sending | Counter| | ||
| `openlineage.emit.time` | Time spent on emitting events | Timer | | ||
| `openlineage.circuitbreaker.engaged`| Status of the Circuit Breaker (engaged or not) | Gauge | | ||
|
||
## Metric Backends | ||
|
||
OpenLineage uses [Micrometer](https://micrometer.io) for metrics collection, similar to how SLF4J operates for logging. Micrometer provides a facade over different metric backends, allowing metrics to be dispatched to various destinations. | ||
|
||
### Configuring Metric Backends | ||
|
||
Below are the available backends and potential configurations using Micrometer's facilities. | ||
|
||
### StatsD | ||
|
||
Full configuration options for StatsD can be found in the [Micrometer's StatsDConfig implementation](https://github.com/micrometer-metrics/micrometer/blob/main/implementations/micrometer-registry-statsd/src/main/java/io/micrometer/statsd/StatsdConfig.java). | ||
|
||
<Tabs groupId="integrations"> | ||
<TabItem value="yaml" label="Yaml Config"> | ||
|
||
```yaml | ||
metrics: | ||
type: statsd | ||
flavor: datadog | ||
host: localhost | ||
port: 8125 | ||
``` | ||
</TabItem> | ||
<TabItem value="spark" label="Spark Config"> | ||
| Parameter | Definition | Example | | ||
--------------------------------------|---------------------------------------|------------- | ||
| spark.openlineage.metrics.type | Metrics type selected | statsd | | ||
| spark.openlineage.metrics.flavor | Flavor of StatsD configuration | datadog | | ||
| spark.openlineage.metrics.host | Host that receives StatsD metrics | localhost | | ||
| spark.openlineage.metrics.port | Port that receives StatsD metrics | 8125 | | ||
</TabItem> | ||
<TabItem value="flink" label="Flink Config"> | ||
| Parameter | Definition | Example | | ||
--------------------------------------|---------------------------------------|------------- | ||
| openlineage.metrics.type | Metrics type selected | statsd | | ||
| openlineage.metrics.flavor | Flavor of StatsD configuration | datadog | | ||
| openlineage.metrics.host | Host that receives StatsD metrics | localhost | | ||
| openlineage.metrics.port | Port that receives StatsD metrics | 8125 | | ||
</TabItem> | ||
</Tabs> |
Oops, something went wrong.