Skip to content

Commit

Permalink
Refreshing website content from main repo.
Browse files Browse the repository at this point in the history
  • Loading branch information
GitHub Action Website Snapshot committed Oct 30, 2024
1 parent 00ed52c commit 4489cd5
Showing 1 changed file with 0 additions and 43 deletions.
43 changes: 0 additions & 43 deletions docs/integrations/spark/spark_column_lineage.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,46 +92,3 @@ To unravel two dependencies implement following logic:

The inputs are also mapped for all dataset dependencies. The result is added to each output.
Finally, the list of outputs with all their inputs is mapped to `ColumnLineageDatasetFacetFields` object.

## Writing custom extensions

Spark framework is known for its great ability to be extended by custom libraries capable of reading or writing to anything. In case of having a custom implementation, we prepared an ability to extend column-level lineage implementation to be able to retrieve information from other input or output LogicalPlan nodes.

Creating such an extension requires implementing a following interface:

```
/** Interface for implementing custom collectors of column-level lineage. */
interface CustomColumnLineageVisitor {
/**
* Collect inputs for a given {@link LogicalPlan}. Column-level lineage mechanism traverses
* LogicalPlan on its node. This method will be called for each traversed node. Input information
* should be put into builder.
*
* @param node
* @param builder
*/
void collectInputs(LogicalPlan node, ColumnLevelLineageBuilder builder);
/**
* Collect outputs for a given {@link LogicalPlan}. Column-level lineage mechanism traverses
* LogicalPlan on its node. This method will be called for each traversed node. Output information
* should be put into builder.
*
* @param node
* @param builder
*/
void collectOutputs(LogicalPlan node, ColumnLevelLineageBuilder builder);
/**
* Collect expressions for a given {@link LogicalPlan}. Column-level lineage mechanism traverses
* LogicalPlan on its node. This method will be called for each traversed node. Expression
* dependency information should be put into builder.
*
* @param node
* @param builder
*/
void collectExpressionDependencies(LogicalPlan node, ColumnLevelLineageBuilder builder);
}
```
and making it available for Service Loader (implementation class name has to be put in a resource file `META-INF/services/io.openlineage.spark.agent.lifecycle.plan.column.CustomColumnLineageVisitor`).

0 comments on commit 4489cd5

Please sign in to comment.