Skip to content

Commit

Permalink
Refreshing website content from main repo.
Browse files Browse the repository at this point in the history
  • Loading branch information
GitHub Action Website Snapshot committed Sep 5, 2024
1 parent 84d044c commit b9ec334
Show file tree
Hide file tree
Showing 4 changed files with 112 additions and 5 deletions.
19 changes: 14 additions & 5 deletions blog/sf-meetup-3/index.mdx
Original file line number Diff line number Diff line change
@@ -1,24 +1,31 @@
---
title: Meet up with us in San Francisco on September 12th
title: Meet up with us in San Francisco & Zoom on September 12th
date: 2024-08-30
authors: [Robinson]
description: Our third SF OpenLineage Meetup will take place on September 12th.
---

**Note**: this event is now hybrid.

Join us on Thursday, September 12th, 2024, from 6:00-9:00 pm PT at the Astronomer offices
in San Francisco to learn more about the present and future of OpenLineage. Meet
in San Francisco or on Zoom to learn more about the present and future of OpenLineage. Meet
other members of the ecosystem, learn about the project’s goals and fundamental
design, and participate in a discussion about the future of the project. Bring
your ideas and vision for OpenLineage!

### Agenda:

- **Unlocking Data Products with OpenLineage at Astronomer**: Julian LaNeve and Jason Ma, Astronomer
- **Unlocking Data Products with OpenLineage at Astronomer**: Julian LaNeve (CTO, Astronomer) and Jason Ma (VP of Product, Astronomer)
- **OpenLineage: From Operators to Hooks** by Maciej Obuchowski, Astronomer+GetInData/Xebia
- **Activating Operational Metadata with Airflow, Atlan and Openlineage** by Kacper Muda, GetInData/Xebia
- **Hamilton, a Scaffold for all Your Python Platform Concerns (and a New OpenLineage Producer)** by Stefan Krawczyk
- **Hamilton, a Scaffold for all Your Python Platform Concerns (and a New OpenLineage Producer)** by Stefan Krawczyk, CEO of DAGWorks
- **Lightning Talk on New Marquez Features and the Marquez Project Roadmap** by Willy Lulciuc, Marquez Lead, and Peter Hicks, Marquez Committer

### Thank you to our sponsors:
**Astronomer**
**GetInData/Xebia**
**LFAI & Data**

<!--truncate-->

Food will be provided, and the meetup is open to all. Don't miss this opportunity
Expand All @@ -27,13 +34,15 @@ there.

**Please [sign up](https://www.meetup.com/meetup-group-bnfqymxe/events/302718127/?utm_medium=referral&utm_campaign=share-btn_savedevents_share_modal&utm_source=link)
to let us know you're coming.**
Choose online if attending on Zoom. The link will be provided to registrants via the meetup group.

### Time, Place & Format

Date: September 12th, 2024
Format: In-person
Format: Hybrid
Time: 6:00-9:00 pm PT
Address: Astronomer, [8 California Street, 7th Floor, San Francisco, CA 94111](https://goo.gl/maps/7UxfePDNPkneyc8v5?coh=178571&entry=tt)
Zoom link: TBA

#### Getting There
The Astronomer SF office is in the Financial District at the corner of California
Expand Down
50 changes: 50 additions & 0 deletions docs/releases/1_20_5.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
---
title: 1.20.5
sidebar_position: 9937
---

# 1.20.5 - 2024-08-23

### Added
* **Python: add `CompositeTransport`** [`#2925`](https://github.com/OpenLineage/OpenLineage/pull/2925) [@JDarDagran](https://github.com/JDarDagran)
*Adds a `CompositeTransport` that can accept other transport configs to instantiate transports and use them to emit events.*
* **Spark: compile & test Spark integration on Java 17** [`#2828`](https://github.com/OpenLineage/OpenLineage/pull/2828) [@pawel-big-lebowski](https://github.com/pawel-big-lebowski)
*The Spark integration is always compiled with Java 17, while tests are running on both Java 8 and Java 17 according to the configuration.*
* **Spark: support preview release of Spark 4.0** [`#2854`](https://github.com/OpenLineage/OpenLineage/pull/2854) [@pawel-big-lebowski](https://github.com/pawel-big-lebowski)
*Includes the Spark 4.0 preview release in the integration tests.*
* **Spark: add handling for `Window`** [`#2901`](https://github.com/OpenLineage/OpenLineage/pull/2901) [@tnazarew](https://github.com/tnazarew)
*Adds handling for `Window`-type nodes of a logical plan.*
* **Spark: extract and send events with raw SQL from Spark** [`#2913`](https://github.com/OpenLineage/OpenLineage/pull/2913) [@Imbruced](https://github.com/Imbruced)
*Adds a parser that traverses `QueryExecution` to get the SQL query used from the SQL field with a BFS algorithm.*
* **Spark: support Mongostream source** [`#2887`](https://github.com/OpenLineage/OpenLineage/pull/2887) [@Imbruced](https://github.com/Imbruced)
*Adds a Mongo streaming visitor and tests.*
* **Spark: new mechanism for disabling facets** [`#2912`](https://github.com/OpenLineage/OpenLineage/pull/2912) [@arturowczarek](https://github.com/arturowczarek)
*The mechanism makes `FacetConfig` accept the disabled flag for any facet instead of passing them as a list.*
* **Spark: support Kinesis source** [`#2906`](https://github.com/OpenLineage/OpenLineage/pull/2906) [@Imbruced](https://github.com/Imbruced)
*Adds a Kinesis class handler in the streaming source builder.*
* **Spark: extract `DatasetIdentifier` from extension `LineageNode`** [`#2900`](https://github.com/OpenLineage/OpenLineage/pull/2900) [@ddebowczyk92](https://github.com/ddebowczyk92)
*Adds support for cases in which `LogicalRelation` has a grandChild node that implements the `LineageRelation` interface.*
* **Spark: extract Dataset from underlying `BaseRelation`** [`#2893`](https://github.com/OpenLineage/OpenLineage/pull/2893) [@ddebowczyk92](https://github.com/ddebowczyk92)
*`DatasetIdentifier` is now extracted from the underlying node of `LogicalRelation`.*
* **Spark: add descriptions and Marquez UI to Docker Compose file** [`#2889`](https://github.com/OpenLineage/OpenLineage/pull/2889) [@jonathanlbt1](https://github.com/jonathanlbt1)
*Adds the `marquez-web` service to docker-compose.yml.*

### Fixed
* **Proxy: bug fixed on error messages descriptions** [`#2880`](https://github.com/OpenLineage/OpenLineage/pull/2880) [@jonathanlbt1](https://github.com/jonathanlbt1)
*Improves error logging.*
* **Proxy: update Docker image for Fluentd 1.17** [`#2877`](https://github.com/OpenLineage/OpenLineage/pull/2877) [@jonathanlbt1](https://github.com/jonathanlbt1)
*Upgrades the Fluentd version.*
* **Spark: fix issue with Kafka source when saving with `for each` batch method** [`#2868`](https://github.com/OpenLineage/OpenLineage/pull/2868) [@imbruced](https://github.com/Imbruced)
*Fixes an issue when Spark is in streaming mode and input for Kafka was not present in the event.*
* **Spark: properly set ARN in namespace for Iceberg Glue symlinks** [`#2943`](https://github.com/OpenLineage/OpenLineage/pull/2943) [@arturowczarek](https://github.com/arturowczarek)
*Makes `IcebergHandler` support Glue catalog tables and create the symlink using the code from `PathUtils`.*
* **Spark: accept any provider for AWS Glue storage format** [`#2917`](https://github.com/OpenLineage/OpenLineage/pull/2917) [@arturowczarek](https://github.com/arturowczarek)
*Makes the AWS Glue ARN generating method accept every format (including Parquet), not only Hive SerDe.*
* **Spark: return valid JSON for failed logical plan serialization** [`#2892`](https://github.com/OpenLineage/OpenLineage/pull/2892) [@arturowczarek](https://github.com/arturowczarek)
*The `LogicalPlanSerializer` now returns `<failed-to-serialize-logical-plan>` for failed serialization instead of an empty string.*
* **Spark: extract legacy column lineage visitors loader** [`#2883`](https://github.com/OpenLineage/OpenLineage/pull/2883) [@arturowczarek](https://github.com/arturowczarek)
*Refactors `CustomCollectorsUtils` for improved readability.*
* **Spark: add Kafka input source when writing in `foreach` batch mode** [`#2868`](https://github.com/OpenLineage/OpenLineage/pull/2868) [@Imbruced](https://github.com/Imbruced)
*Fixes a bug keeping Kafka input sources from being produced.*
* **Spark: extract `DatasetIdentifier` from `SaveIntoDataSourceCommandVisitor` options** [`#2934`](https://github.com/OpenLineage/OpenLineage/pull/2934) [@ddebowczyk92](https://github.com/ddebowczyk92)
*Extracts `DatasetIdentifier` from command's options instead of relying on `p.createRelation(sqlContext, command.options())`, which is a heavy operation for `JdbcRelationProvider`.*
24 changes: 24 additions & 0 deletions docs/releases/1_21_1.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
---
title: 1.21.1
sidebar_position: 9936
---

# 1.21.1 - 2024-08-29

### Added
* **Spec: add GCP Dataproc facet** [`#2987`](https://github.com/OpenLineage/OpenLineage/pull/2987) [@tnazarew](https://github.com/tnazarew)
*Registers the Google Cloud Platform Dataproc run facet.*

### Fixed
* **Airflow: update SQL integration code to work with latest sqlparser-rs main** [`#2983`](https://github.com/OpenLineage/OpenLineage/pull/2983) [@kacpermuda](https://github.com/kacpermuda)
*Adjusts the SQL integration after our sqlparser-rs fork has been updated to the latest main.*
* **Spark: fix AWS Glue jobs naming for SQL events** [`#3001`](https://github.com/OpenLineage/OpenLineage/pull/3001) [@arturowczarek](https://github.com/arturowczarek)
*SQL events now properly use the names of the jobs retrieved from AWS Glue.*
* **Spark: fix issue with column lineage when using delta merge into command** [`#2986`](https://github.com/OpenLineage/OpenLineage/pull/2986) [@Imbruced](https://github.com/Imbruced)
*A view instance of a node is now included when gathering data sources for input columns.*
* **Spark: minor Spark filters refactor** [`#2990`](https://github.com/OpenLineage/OpenLineage/pull/2990) [@arturowczarek](https://github.com/arturowczarek)
*Fixes a number of minor issues.*
* **Spark: Iceberg tables in AWS Glue have slashes instead of dots in symlinks** [`#2984`](https://github.com/OpenLineage/OpenLineage/pull/2984) [@arturowczarek](https://github.com/arturowczarek)
*They should use slashes and the prefix `table/`.*
* **Spark: lineage for Iceberg datasets that are present outside of Spark's catalog is now present** [`#2937`](https://github.com/OpenLineage/OpenLineage/pull/2937) [@d-m-h](https://github.com/d-m-h)
*Previously, reading Iceberg datasets outside the configured Spark catalog prevented the datasets from being present in the `inputs` property of the `RunEvent`.*
24 changes: 24 additions & 0 deletions docs/releases/1_22_0.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
---
title: 1.22.0
sidebar_position: 9935
---

# 1.22.0 - 2024-09-05

### Added
* **SQL: add support for `USE` statement with different syntaxes** [`#2944`](https://github.com/OpenLineage/OpenLineage/pull/2944) [@kacpermuda](https://github.com/kacpermuda)
*Adjusts our Context so that it can use the new support for this statement in the parser and pass it to a number of queries.*
* **Spark: add script to build Spark dependencies** [`#3044`](https://github.com/OpenLineage/OpenLineage/pull/3044) [@arturowczarek](https://github.com/arturowczarek)
*Adds a script to rebuild dependencies automatically following releases.*
* **Website: versionable docs** [`#3007`](https://github.com/OpenLineage/OpenLineage/pull/3007) [`#3023`](https://github.com/OpenLineage/OpenLineage/pull/3023) [@pawel-big-lebowski](https://github.com/pawel-big-lebowski)
*Adds a GitHub action that creates a new Docusaurus version on a tag push, verifiable using the openlineage-site repo. Implements a monorepo approach in a new `website` directory.*

### Fixed
* **SQL: add support for `SingleQuotedString` in `Identifier()`** [`#3035`](https://github.com/OpenLineage/OpenLineage/pull/3035) [@kacpermuda](https://github.com/kacpermuda)
*Single quoted strings were being treated differently than strings with no quotes, double quotes, or backticks.*
* **SQL: support `IDENTIFIER` function instead of treating it like table name** [`#2999`](https://github.com/OpenLineage/OpenLineage/pull/2999) [@kacpermuda](https://github.com/kacpermuda)
*Adds support for this identifier in SELECT, MERGE, UPDATE, and DELETE statements. For now, only static identifiers are supported. When a variable is used, this table is removed from lineage to avoid emitting incorrect lineage.*
* **Spark: fix issue with only one table in inputs from SQL query while reading from JDBC** [`#2918`](https://github.com/OpenLineage/OpenLineage/pull/2918) [@Imbruced](https://github.com/Imbruced)
*Events created did not contain the correct input table when the query contained multiple tables.*
* **Spark: fix AWS Glue jobs naming for RDD events** [`#3020`](https://github.com/OpenLineage/OpenLineage/pull/3020) [@arturowczarek](https://github.com/arturowczarek)
*The naming for RDD jobs now uses the same code as SQL and Application events.*

0 comments on commit b9ec334

Please sign in to comment.