Skip to content

Commit

Permalink
[docs] Fix typo in website/docs/streaming-lakehouse/overview.md (alib…
Browse files Browse the repository at this point in the history
  • Loading branch information
qining-mj authored Dec 25, 2024
1 parent 2435149 commit 07cae40
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions website/docs/streaming-lakehouse/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ sidebar_position: 1

Lakehouse represents a new, open architecture that combines the best elements of data lakes and data warehouses.
It combines data lake scalability and cost-effectiveness with data warehouse reliability and performance.
The well known data lake format such like [Apache Iceberg](https://iceberg.apache.org/), [Apache Paimon](https://paimon.apache.org/), [Apache Hudi](https://hudi.apache.org/) and [Delta Lake](https://delta.io/) play key roles in the Lakehouse architecture,
The well-known data lake format such like [Apache Iceberg](https://iceberg.apache.org/), [Apache Paimon](https://paimon.apache.org/), [Apache Hudi](https://hudi.apache.org/) and [Delta Lake](https://delta.io/) play key roles in the Lakehouse architecture,
facilitating a harmonious balance between data storage, reliability, and analytical capabilities within a single, unified platform.

Lakehouse, as a modern architecture, is effective in addressing the complex needs of data management and analytics.
Expand All @@ -17,7 +17,7 @@ With these data lake formats, you will get into a contradictory situation:

1. If you require low latency, then you write and commit frequently, which means many small Parquet files. This becomes inefficient for
reads which must now deal with masses of small files.
2. If you require read efficiency, then you accumulate data until you can write to large Parquet files, but this introduces
2. If you require reading efficiency, then you accumulate data until you can write to large Parquet files, but this introduces
much higher latency.

Overall, these data lake formats typically achieve data freshness at best within minute-level granularity, even under optimal usage conditions.
Expand Down

0 comments on commit 07cae40

Please sign in to comment.