From b9ec334e9d604c595a28b94cdf2dcfadc9c80215 Mon Sep 17 00:00:00 2001 From: GitHub Action Website Snapshot <> Date: Thu, 5 Sep 2024 21:00:20 +0000 Subject: [PATCH] Refreshing website content from main repo. Source commit: https://github.com/OpenLineage/OpenLineage/commit/2b4b26b78cdffab5b69e91e365f3693a6e2964ce --- blog/sf-meetup-3/index.mdx | 19 +++++++++++---- docs/releases/1_20_5.md | 50 ++++++++++++++++++++++++++++++++++++++ docs/releases/1_21_1.md | 24 ++++++++++++++++++ docs/releases/1_22_0.md | 24 ++++++++++++++++++ 4 files changed, 112 insertions(+), 5 deletions(-) create mode 100644 docs/releases/1_20_5.md create mode 100644 docs/releases/1_21_1.md create mode 100644 docs/releases/1_22_0.md diff --git a/blog/sf-meetup-3/index.mdx b/blog/sf-meetup-3/index.mdx index c2da511..b39b906 100644 --- a/blog/sf-meetup-3/index.mdx +++ b/blog/sf-meetup-3/index.mdx @@ -1,24 +1,31 @@ --- -title: Meet up with us in San Francisco on September 12th +title: Meet up with us in San Francisco & Zoom on September 12th date: 2024-08-30 authors: [Robinson] description: Our third SF OpenLineage Meetup will take place on September 12th. --- +**Note**: this event is now hybrid. + Join us on Thursday, September 12th, 2024, from 6:00-9:00 pm PT at the Astronomer offices -in San Francisco to learn more about the present and future of OpenLineage. Meet +in San Francisco or on Zoom to learn more about the present and future of OpenLineage. Meet other members of the ecosystem, learn about the project’s goals and fundamental design, and participate in a discussion about the future of the project. Bring your ideas and vision for OpenLineage! ### Agenda: -- **Unlocking Data Products with OpenLineage at Astronomer**: Julian LaNeve and Jason Ma, Astronomer +- **Unlocking Data Products with OpenLineage at Astronomer**: Julian LaNeve (CTO, Astronomer) and Jason Ma (VP of Product, Astronomer) - **OpenLineage: From Operators to Hooks** by Maciej Obuchowski, Astronomer+GetInData/Xebia - **Activating Operational Metadata with Airflow, Atlan and Openlineage** by Kacper Muda, GetInData/Xebia -- **Hamilton, a Scaffold for all Your Python Platform Concerns (and a New OpenLineage Producer)** by Stefan Krawczyk +- **Hamilton, a Scaffold for all Your Python Platform Concerns (and a New OpenLineage Producer)** by Stefan Krawczyk, CEO of DAGWorks - **Lightning Talk on New Marquez Features and the Marquez Project Roadmap** by Willy Lulciuc, Marquez Lead, and Peter Hicks, Marquez Committer +### Thank you to our sponsors: +**Astronomer** +**GetInData/Xebia** +**LFAI & Data** + Food will be provided, and the meetup is open to all. Don't miss this opportunity @@ -27,13 +34,15 @@ there. **Please [sign up](https://www.meetup.com/meetup-group-bnfqymxe/events/302718127/?utm_medium=referral&utm_campaign=share-btn_savedevents_share_modal&utm_source=link) to let us know you're coming.** +Choose online if attending on Zoom. The link will be provided to registrants via the meetup group. ### Time, Place & Format Date: September 12th, 2024 -Format: In-person +Format: Hybrid Time: 6:00-9:00 pm PT Address: Astronomer, [8 California Street, 7th Floor, San Francisco, CA 94111](https://goo.gl/maps/7UxfePDNPkneyc8v5?coh=178571&entry=tt) +Zoom link: TBA #### Getting There The Astronomer SF office is in the Financial District at the corner of California diff --git a/docs/releases/1_20_5.md b/docs/releases/1_20_5.md new file mode 100644 index 0000000..57bf53f --- /dev/null +++ b/docs/releases/1_20_5.md @@ -0,0 +1,50 @@ +--- +title: 1.20.5 +sidebar_position: 9937 +--- + +# 1.20.5 - 2024-08-23 + +### Added +* **Python: add `CompositeTransport`** [`#2925`](https://github.com/OpenLineage/OpenLineage/pull/2925) [@JDarDagran](https://github.com/JDarDagran) + *Adds a `CompositeTransport` that can accept other transport configs to instantiate transports and use them to emit events.* +* **Spark: compile & test Spark integration on Java 17** [`#2828`](https://github.com/OpenLineage/OpenLineage/pull/2828) [@pawel-big-lebowski](https://github.com/pawel-big-lebowski) + *The Spark integration is always compiled with Java 17, while tests are running on both Java 8 and Java 17 according to the configuration.* +* **Spark: support preview release of Spark 4.0** [`#2854`](https://github.com/OpenLineage/OpenLineage/pull/2854) [@pawel-big-lebowski](https://github.com/pawel-big-lebowski) + *Includes the Spark 4.0 preview release in the integration tests.* +* **Spark: add handling for `Window`** [`#2901`](https://github.com/OpenLineage/OpenLineage/pull/2901) [@tnazarew](https://github.com/tnazarew) + *Adds handling for `Window`-type nodes of a logical plan.* +* **Spark: extract and send events with raw SQL from Spark** [`#2913`](https://github.com/OpenLineage/OpenLineage/pull/2913) [@Imbruced](https://github.com/Imbruced) + *Adds a parser that traverses `QueryExecution` to get the SQL query used from the SQL field with a BFS algorithm.* +* **Spark: support Mongostream source** [`#2887`](https://github.com/OpenLineage/OpenLineage/pull/2887) [@Imbruced](https://github.com/Imbruced) + *Adds a Mongo streaming visitor and tests.* +* **Spark: new mechanism for disabling facets** [`#2912`](https://github.com/OpenLineage/OpenLineage/pull/2912) [@arturowczarek](https://github.com/arturowczarek) + *The mechanism makes `FacetConfig` accept the disabled flag for any facet instead of passing them as a list.* +* **Spark: support Kinesis source** [`#2906`](https://github.com/OpenLineage/OpenLineage/pull/2906) [@Imbruced](https://github.com/Imbruced) + *Adds a Kinesis class handler in the streaming source builder.* +* **Spark: extract `DatasetIdentifier` from extension `LineageNode`** [`#2900`](https://github.com/OpenLineage/OpenLineage/pull/2900) [@ddebowczyk92](https://github.com/ddebowczyk92) + *Adds support for cases in which `LogicalRelation` has a grandChild node that implements the `LineageRelation` interface.* +* **Spark: extract Dataset from underlying `BaseRelation`** [`#2893`](https://github.com/OpenLineage/OpenLineage/pull/2893) [@ddebowczyk92](https://github.com/ddebowczyk92) + *`DatasetIdentifier` is now extracted from the underlying node of `LogicalRelation`.* +* **Spark: add descriptions and Marquez UI to Docker Compose file** [`#2889`](https://github.com/OpenLineage/OpenLineage/pull/2889) [@jonathanlbt1](https://github.com/jonathanlbt1) + *Adds the `marquez-web` service to docker-compose.yml.* + +### Fixed +* **Proxy: bug fixed on error messages descriptions** [`#2880`](https://github.com/OpenLineage/OpenLineage/pull/2880) [@jonathanlbt1](https://github.com/jonathanlbt1) + *Improves error logging.* +* **Proxy: update Docker image for Fluentd 1.17** [`#2877`](https://github.com/OpenLineage/OpenLineage/pull/2877) [@jonathanlbt1](https://github.com/jonathanlbt1) + *Upgrades the Fluentd version.* +* **Spark: fix issue with Kafka source when saving with `for each` batch method** [`#2868`](https://github.com/OpenLineage/OpenLineage/pull/2868) [@imbruced](https://github.com/Imbruced) + *Fixes an issue when Spark is in streaming mode and input for Kafka was not present in the event.* +* **Spark: properly set ARN in namespace for Iceberg Glue symlinks** [`#2943`](https://github.com/OpenLineage/OpenLineage/pull/2943) [@arturowczarek](https://github.com/arturowczarek) + *Makes `IcebergHandler` support Glue catalog tables and create the symlink using the code from `PathUtils`.* +* **Spark: accept any provider for AWS Glue storage format** [`#2917`](https://github.com/OpenLineage/OpenLineage/pull/2917) [@arturowczarek](https://github.com/arturowczarek) + *Makes the AWS Glue ARN generating method accept every format (including Parquet), not only Hive SerDe.* +* **Spark: return valid JSON for failed logical plan serialization** [`#2892`](https://github.com/OpenLineage/OpenLineage/pull/2892) [@arturowczarek](https://github.com/arturowczarek) + *The `LogicalPlanSerializer` now returns `` for failed serialization instead of an empty string.* +* **Spark: extract legacy column lineage visitors loader** [`#2883`](https://github.com/OpenLineage/OpenLineage/pull/2883) [@arturowczarek](https://github.com/arturowczarek) + *Refactors `CustomCollectorsUtils` for improved readability.* +* **Spark: add Kafka input source when writing in `foreach` batch mode** [`#2868`](https://github.com/OpenLineage/OpenLineage/pull/2868) [@Imbruced](https://github.com/Imbruced) + *Fixes a bug keeping Kafka input sources from being produced.* +* **Spark: extract `DatasetIdentifier` from `SaveIntoDataSourceCommandVisitor` options** [`#2934`](https://github.com/OpenLineage/OpenLineage/pull/2934) [@ddebowczyk92](https://github.com/ddebowczyk92) + *Extracts `DatasetIdentifier` from command's options instead of relying on `p.createRelation(sqlContext, command.options())`, which is a heavy operation for `JdbcRelationProvider`.* diff --git a/docs/releases/1_21_1.md b/docs/releases/1_21_1.md new file mode 100644 index 0000000..326d45f --- /dev/null +++ b/docs/releases/1_21_1.md @@ -0,0 +1,24 @@ +--- +title: 1.21.1 +sidebar_position: 9936 +--- + +# 1.21.1 - 2024-08-29 + +### Added +* **Spec: add GCP Dataproc facet** [`#2987`](https://github.com/OpenLineage/OpenLineage/pull/2987) [@tnazarew](https://github.com/tnazarew) + *Registers the Google Cloud Platform Dataproc run facet.* + +### Fixed +* **Airflow: update SQL integration code to work with latest sqlparser-rs main** [`#2983`](https://github.com/OpenLineage/OpenLineage/pull/2983) [@kacpermuda](https://github.com/kacpermuda) + *Adjusts the SQL integration after our sqlparser-rs fork has been updated to the latest main.* +* **Spark: fix AWS Glue jobs naming for SQL events** [`#3001`](https://github.com/OpenLineage/OpenLineage/pull/3001) [@arturowczarek](https://github.com/arturowczarek) + *SQL events now properly use the names of the jobs retrieved from AWS Glue.* +* **Spark: fix issue with column lineage when using delta merge into command** [`#2986`](https://github.com/OpenLineage/OpenLineage/pull/2986) [@Imbruced](https://github.com/Imbruced) + *A view instance of a node is now included when gathering data sources for input columns.* +* **Spark: minor Spark filters refactor** [`#2990`](https://github.com/OpenLineage/OpenLineage/pull/2990) [@arturowczarek](https://github.com/arturowczarek) + *Fixes a number of minor issues.* +* **Spark: Iceberg tables in AWS Glue have slashes instead of dots in symlinks** [`#2984`](https://github.com/OpenLineage/OpenLineage/pull/2984) [@arturowczarek](https://github.com/arturowczarek) + *They should use slashes and the prefix `table/`.* +* **Spark: lineage for Iceberg datasets that are present outside of Spark's catalog is now present** [`#2937`](https://github.com/OpenLineage/OpenLineage/pull/2937) [@d-m-h](https://github.com/d-m-h) + *Previously, reading Iceberg datasets outside the configured Spark catalog prevented the datasets from being present in the `inputs` property of the `RunEvent`.* diff --git a/docs/releases/1_22_0.md b/docs/releases/1_22_0.md new file mode 100644 index 0000000..74bff79 --- /dev/null +++ b/docs/releases/1_22_0.md @@ -0,0 +1,24 @@ +--- +title: 1.22.0 +sidebar_position: 9935 +--- + +# 1.22.0 - 2024-09-05 + +### Added +* **SQL: add support for `USE` statement with different syntaxes** [`#2944`](https://github.com/OpenLineage/OpenLineage/pull/2944) [@kacpermuda](https://github.com/kacpermuda) + *Adjusts our Context so that it can use the new support for this statement in the parser and pass it to a number of queries.* +* **Spark: add script to build Spark dependencies** [`#3044`](https://github.com/OpenLineage/OpenLineage/pull/3044) [@arturowczarek](https://github.com/arturowczarek) + *Adds a script to rebuild dependencies automatically following releases.* +* **Website: versionable docs** [`#3007`](https://github.com/OpenLineage/OpenLineage/pull/3007) [`#3023`](https://github.com/OpenLineage/OpenLineage/pull/3023) [@pawel-big-lebowski](https://github.com/pawel-big-lebowski) + *Adds a GitHub action that creates a new Docusaurus version on a tag push, verifiable using the openlineage-site repo. Implements a monorepo approach in a new `website` directory.* + +### Fixed +* **SQL: add support for `SingleQuotedString` in `Identifier()`** [`#3035`](https://github.com/OpenLineage/OpenLineage/pull/3035) [@kacpermuda](https://github.com/kacpermuda) + *Single quoted strings were being treated differently than strings with no quotes, double quotes, or backticks.* +* **SQL: support `IDENTIFIER` function instead of treating it like table name** [`#2999`](https://github.com/OpenLineage/OpenLineage/pull/2999) [@kacpermuda](https://github.com/kacpermuda) + *Adds support for this identifier in SELECT, MERGE, UPDATE, and DELETE statements. For now, only static identifiers are supported. When a variable is used, this table is removed from lineage to avoid emitting incorrect lineage.* +* **Spark: fix issue with only one table in inputs from SQL query while reading from JDBC** [`#2918`](https://github.com/OpenLineage/OpenLineage/pull/2918) [@Imbruced](https://github.com/Imbruced) + *Events created did not contain the correct input table when the query contained multiple tables.* +* **Spark: fix AWS Glue jobs naming for RDD events** [`#3020`](https://github.com/OpenLineage/OpenLineage/pull/3020) [@arturowczarek](https://github.com/arturowczarek) + *The naming for RDD jobs now uses the same code as SQL and Application events.*