Skip to content

Commit

Permalink
Merge pull request #1 from OpenLineage/fix-missing-1.22
Browse files Browse the repository at this point in the history
manually create version 1.22
  • Loading branch information
pawel-big-lebowski authored Oct 4, 2024
2 parents 34c4f4f + b8ab2a9 commit 14c1bd8
Show file tree
Hide file tree
Showing 232 changed files with 12,547 additions and 0 deletions.
1 change: 1 addition & 0 deletions versioned_docs/version-1.22.0/before-ol.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 4 additions & 0 deletions versioned_docs/version-1.22.0/client/_category_.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{
"label": "Client Libraries",
"position": 4
}
4 changes: 4 additions & 0 deletions versioned_docs/version-1.22.0/client/java/_category_.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{
"label": "Java",
"position": 1
}
112 changes: 112 additions & 0 deletions versioned_docs/version-1.22.0/client/java/configuration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
---
sidebar_position: 2
title: Configuration
---

We recommend configuring the client with an `openlineage.yml` file that contains all the
details of how to connect to your OpenLineage backend.

See [example configurations.](#transports)

You can make this file available to the client in three ways (the list also presents precedence of the configuration):

1. Set an `OPENLINEAGE_CONFIG` environment variable to a file path: `OPENLINEAGE_CONFIG=path/to/openlineage.yml`.
2. Place an `openlineage.yml` in the user's current working directory.
3. Place an `openlineage.yml` under `.openlineage/` in the user's home directory (`~/.openlineage/openlineage.yml`).

## Environment Variables

The following environment variables are available:

| Name | Description | Since |
|----------------------|-----------------------------------------------------------------------------|-------|
| OPENLINEAGE_CONFIG | The path to the YAML configuration file. Example: `path/to/openlineage.yml` | |
| OPENLINEAGE_DISABLED | When `true`, OpenLineage will not emit events. | 0.9.0 |

## Facets Configuration

In YAML configuration file you can also disable facets to filter them out from the OpenLineage event.

*YAML Configuration*

```yaml
transport:
type: console
facets:
spark_unknown:
disabled: true
spark:
logicalPlan:
disabled: true
```
### Deprecated syntax
The following syntax is deprecated and soon will be removed:
```yaml
transport:
type: console
facets:
disabled:
- spark_unknown
- spark.logicalPlan
```
The rationale behind deprecation is that some of the facets were disabled by default in some integrations. When we added
something extra but didn't include the defaults, they were unintentionally enabled.
## Transports
import Transports from './partials/java_transport.md';
<Transports/>
### Error Handling via Transport
```java
// Connect to http://localhost:5000
OpenLineageClient client = OpenLineageClient.builder()
.transport(
HttpTransport.builder()
.uri("http://localhost:5000")
.apiKey("f38d2189-c603-4b46-bdea-e573a3b5a7d5")
.build())
.registerErrorHandler(new EmitErrorHandler() {
@Override
public void handleError(Throwable throwable) {
// Handle emit error here
}
}).build();
```

### Defining Your Own Transport

```java
OpenLineageClient client = OpenLineageClient.builder()
.transport(
new MyTransport() {
@Override
public void emit(OpenLineage.RunEvent runEvent) {
// Add emit logic here
}
}).build();
```

## Circuit Breakers

import CircuitBreakers from './partials/java_circuit_breaker.md';

<CircuitBreakers/>

## Metrics

import Metrics from './partials/java_metrics.md';

<Metrics/>

## Dataset Namespace Resolver

import DatasetNamespaceResolver from './partials/java_namespace_resolver.md';

<DatasetNamespaceResolver/>
39 changes: 39 additions & 0 deletions versioned_docs/version-1.22.0/client/java/java.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
---
sidebar_position: 5
---

# Java

## Overview

The OpenLineage Java is a SDK for Java programming language that users can use to generate and emit OpenLineage events to OpenLineage backends.
The core data structures currently offered by the client are the `RunEvent`, `RunState`, `Run`, `Job`, `Dataset`,
and `Transport` classes, along with various `Facets` that can come under run, job, and dataset.

There are various [transport classes](#transports) that the library provides that carry the lineage events into various target endpoints (e.g. HTTP).

You can also use the Java client to create your own custom integrations.

## Installation

Java client is provided as library that can either be imported into your Java project using Maven or Gradle.

Maven:

```xml
<dependency>
<groupId>io.openlineage</groupId>
<artifactId>openlineage-java</artifactId>
<version>${OPENLINEAGE_VERSION}</version>
</dependency>
```

or Gradle:

```groovy
implementation("io.openlineage:openlineage-java:${OPENLINEAGE_VERSION}")
```

For more information on the available versions of the `openlineage-java`,
please refer to the [maven repository](https://search.maven.org/artifact/io.openlineage/openlineage-java).

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';

:::info
This feature is available in OpenLineage versions >= 1.9.0.
:::

To prevent from over-instrumentation OpenLineage integration provides a circuit breaker mechanism
that stops OpenLineage from creating, serializing and sending OpenLineage events.

### Simple Memory Circuit Breaker

Simple circuit breaker which is working based only on free memory within JVM. Configuration should
contain free memory threshold limit (percentage). Default value is `20%`. The circuit breaker
will close within first call if free memory is low. `circuitCheckIntervalInMillis` parameter is used
to configure a frequency circuit breaker is called. Default value is `1000ms`, when no entry in config.
`timeoutInSeconds` is optional. If set, OpenLineage code execution is terminated when a timeout
is reached (added in version 1.13).

<Tabs groupId="integrations">
<TabItem value="yaml" label="Yaml Config">

```yaml
circuitBreaker:
type: simpleMemory
memoryThreshold: 20
circuitCheckIntervalInMillis: 1000
timeoutInSeconds: 90
```
</TabItem>
<TabItem value="spark" label="Spark Config">
| Parameter | Definition | Example |
--------------------------------------|----------------------------------------------------------------|--------------
| spark.openlineage.circuitBreaker.type | Circuit breaker type selected | simpleMemory |
| spark.openlineage.circuitBreaker.memoryThreshold | Memory threshold | 20 |
| spark.openlineage.circuitBreaker.circuitCheckIntervalInMillis | Frequency of checking circuit breaker | 1000 |
| spark.openlineage.circuitBreaker.timeoutInSeconds | Optional timeout for OpenLineage execution (Since version 1.13)| 90 |
</TabItem>
<TabItem value="flink" label="Flink Config">
| Parameter | Definition | Example |
--------------------------------------|---------------------------------------------|-------------
| openlineage.circuitBreaker.type | Circuit breaker type selected | simpleMemory |
| openlineage.circuitBreaker.memoryThreshold | Memory threshold | 20 |
| openlineage.circuitBreaker.circuitCheckIntervalInMillis | Frequency of checking circuit breaker | 1000 |
| spark.openlineage.circuitBreaker.timeoutInSeconds | Optional timeout for OpenLineage execution (Since version 1.13) | 90 |
</TabItem>
</Tabs>
### Java Runtime Circuit Breaker
More complex version of circuit breaker. The amount of free memory can be low as long as
amount of time spent on Garbage Collection is acceptable. `JavaRuntimeCircuitBreaker` closes
when free memory drops below threshold and amount of time spent on garbage collection exceeds
given threshold (`10%` by default). The circuit breaker is always open when checked for the first time
as GC threshold is computed since the previous circuit breaker call.
`circuitCheckIntervalInMillis` parameter is used
to configure a frequency circuit breaker is called.
Default value is `1000ms`, when no entry in config.
`timeoutInSeconds` is optional. If set, OpenLineage code execution is terminated when a timeout
is reached (added in version 1.13).

<Tabs groupId="integrations">
<TabItem value="yaml" label="Yaml Config">

```yaml
circuitBreaker:
type: javaRuntime
memoryThreshold: 20
gcCpuThreshold: 10
circuitCheckIntervalInMillis: 1000
timeoutInSeconds: 90
```
</TabItem>
<TabItem value="spark" label="Spark Config">

| Parameter | Definition | Example |
--------------------------------------|---------------------------------------|-------------
| spark.openlineage.circuitBreaker.type | Circuit breaker type selected | javaRuntime |
| spark.openlineage.circuitBreaker.memoryThreshold | Memory threshold | 20 |
| spark.openlineage.circuitBreaker.gcCpuThreshold | Garbage Collection CPU threshold | 10 |
| spark.openlineage.circuitBreaker.circuitCheckIntervalInMillis | Frequency of checking circuit breaker | 1000 |
| spark.openlineage.circuitBreaker.timeoutInSeconds | Optional timeout for OpenLineage execution (Since version 1.13)| 90 |


</TabItem>
<TabItem value="flink" label="Flink Config">

| Parameter | Definition | Example |
--------------------------------------|---------------------------------------|-------------
| openlineage.circuitBreaker.type | Circuit breaker type selected | javaRuntime |
| openlineage.circuitBreaker.memoryThreshold | Memory threshold | 20 |
| openlineage.circuitBreaker.gcCpuThreshold | Garbage Collection CPU threshold | 10 |
| openlineage.circuitBreaker.circuitCheckIntervalInMillis | Frequency of checking circuit breaker | 1000 |
| spark.openlineage.circuitBreaker.timeoutInSeconds | Optional timeout for OpenLineage execution (Since version 1.13) | 90 |


</TabItem>
</Tabs>

### Custom Circuit Breaker

List of available circuit breakers can be extended with custom one loaded via ServiceLoader
with own implementation of `io.openlineage.client.circuitBreaker.CircuitBreakerBuilder`.
64 changes: 64 additions & 0 deletions versioned_docs/version-1.22.0/client/java/partials/java_metrics.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';

:::info
This feature is available in OpenLineage 1.11 and above
:::

To ease the operational experience of using the OpenLineage integrations, this document details the metrics collected by the Java client and the configuration settings for various metric backends.

### Metrics collected by Java Client

The following table outlines the metrics collected by the OpenLineage Java client, which help in monitoring the integration's performance:

| Metric | Definition | Type |
|-------------------------------------|-------------------------------------------------------|--------|
| `openlineage.emit.start` | Number of events the integration started to send | Counter|
| `openlineage.emit.complete` | Number of events the integration completed sending | Counter|
| `openlineage.emit.time` | Time spent on emitting events | Timer |
| `openlineage.circuitbreaker.engaged`| Status of the Circuit Breaker (engaged or not) | Gauge |

## Metric Backends

OpenLineage uses [Micrometer](https://micrometer.io) for metrics collection, similar to how SLF4J operates for logging. Micrometer provides a facade over different metric backends, allowing metrics to be dispatched to various destinations.

### Configuring Metric Backends

Below are the available backends and potential configurations using Micrometer's facilities.

### StatsD

Full configuration options for StatsD can be found in the [Micrometer's StatsDConfig implementation](https://github.com/micrometer-metrics/micrometer/blob/main/implementations/micrometer-registry-statsd/src/main/java/io/micrometer/statsd/StatsdConfig.java).

<Tabs groupId="integrations">
<TabItem value="yaml" label="Yaml Config">

```yaml
metrics:
type: statsd
flavor: datadog
host: localhost
port: 8125
```
</TabItem>
<TabItem value="spark" label="Spark Config">
| Parameter | Definition | Example |
--------------------------------------|---------------------------------------|-------------
| spark.openlineage.metrics.type | Metrics type selected | statsd |
| spark.openlineage.metrics.flavor | Flavor of StatsD configuration | datadog |
| spark.openlineage.metrics.host | Host that receives StatsD metrics | localhost |
| spark.openlineage.metrics.port | Port that receives StatsD metrics | 8125 |
</TabItem>
<TabItem value="flink" label="Flink Config">
| Parameter | Definition | Example |
--------------------------------------|---------------------------------------|-------------
| openlineage.metrics.type | Metrics type selected | statsd |
| openlineage.metrics.flavor | Flavor of StatsD configuration | datadog |
| openlineage.metrics.host | Host that receives StatsD metrics | localhost |
| openlineage.metrics.port | Port that receives StatsD metrics | 8125 |
</TabItem>
</Tabs>
Loading

0 comments on commit 14c1bd8

Please sign in to comment.