Skip to content
This repository has been archived by the owner on Jan 29, 2024. It is now read-only.

Commit

Permalink
address review feedback
Browse files Browse the repository at this point in the history
  • Loading branch information
harshini-rangaswamy committed Sep 22, 2023
1 parent 6b2eed1 commit 1b4b610
Show file tree
Hide file tree
Showing 5 changed files with 17 additions and 8 deletions.
15 changes: 9 additions & 6 deletions docs/products/kafka/concepts/kafka-tiered-storage.rst
Original file line number Diff line number Diff line change
@@ -1,11 +1,12 @@
Tiered storage overview
==========================

Tiered storage in Aiven for Apache Kafka allows you to manage your data more efficiently by leveraging two distinct storage types—local disk and remote cloud storage options like AWS S3 and Google Cloud Storage. This feature offers a tailored approach to data storage, allowing you to allocate frequently accessed data to high-speed local disks while offloading less critical or infrequently accessed data to more cost-effective remote storage solutions. Tiered storage enables you to indefinitely store data on specific topics without running out of space. Once enabled, it is configured per topic, giving you granular control over data storage needs.
Tiered storage in Aiven for Apache Kafka® allows you to manage your data more efficiently by leveraging two distinct storage types—local disk and remote cloud storage options like AWS S3 and Google Cloud Storage. This feature offers a tailored approach to data storage, allowing you to allocate frequently accessed data to high-speed local disks while offloading less critical or infrequently accessed data to more cost-effective remote storage solutions. Tiered storage enables you to indefinitely store data on specific topics without running out of space. Once enabled, it is configured per topic, giving you granular control over data storage needs.


.. note::
Azure blob storage is not yet supported for tiered storage in Aiven for Apache Kafka.
- Tiered stoage for Aiven for Apache Kafka® is support from Apache Kafka version 3.6 or higher

Check failure on line 8 in docs/products/kafka/concepts/kafka-tiered-storage.rst

View workflow job for this annotation

GitHub Actions / vale

[vale] docs/products/kafka/concepts/kafka-tiered-storage.rst#L8

[Aiven.aiven_spelling] 'stoage' does not seem to be a recognised word
Raw output
{"message": "[Aiven.aiven_spelling] 'stoage' does not seem to be a recognised word", "location": {"path": "docs/products/kafka/concepts/kafka-tiered-storage.rst", "range": {"start": {"line": 8, "column": 14}}}, "severity": "ERROR"}
- Azure blob storage is not yet supported for tiered storage in Aiven for Apache Kafka.


Tiered storage offers multiple benefits, including:
Expand All @@ -25,6 +26,8 @@ Understanding when and why to use tiered storage in Aiven for Apache Kafka will

* **Long-term data retention**: Many organizations require large-scale data storage for extended periods, either for regulatory compliance or historical data analysis. Cloud services provide an almost limitless storage capacity, making it possible to keep data accessible for as long as required at a reasonable cost. This is where tiered storage becomes especially valuable.
* **High-speed data ingestion**: Tiered storage can offer a solution when dealing with unpredictable or sudden influxes of data. By supplementing the local disks with cloud storage, sudden increases in incoming data can be managed, ensuring optimum system performance.
* **Unlock unexplored opportunities:** Tiered storage in Aiven for Apache Kafka addresses existing storage challenges and opens the door to new and innovative use cases that were once unfeasible or cost-prohibitive. By eliminating traditional storage limitations, organizations gain the flexibility to support a wide range of applications and workflows, even those where Apache Kafka might have been considered impractical before. We encourage users to leverage this newfound flexibility to think creatively and redefine their experience with Apache Kafka.



Pricing
Expand All @@ -35,8 +38,8 @@ Tiered storage costs are determined by the amount of remote storage used, measur
Related reading
----------------

* :doc:`How tiered storage works in Aiven for Apache Kafka® <../tiered-storage-how-it-works.html>`
* :doc:`Guarantees <../tiered-storage-guarantees>`
* :doc:`Backups <../tiered-storage-backups>`
* :doc:`Limiations <../tiered-storage-limitations>`
* :doc:`How tiered storage works in Aiven for Apache Kafka® </docs/products/kafka/concepts/tiered-storage-how-it-works>`
* :doc:`Guarantees </docs/products/kafka/concepts/tiered-storage-guarantees>`
* :doc:`Backups </docs/products/kafka/concepts/tiered-storage-backups>`
* :doc:`Limiations </docs/products/kafka/concepts/tiered-storage-limitations>`

4 changes: 3 additions & 1 deletion docs/products/kafka/concepts/list-kafka-tiered-storage.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
Tiered storage in Aiven for Apache Kafka®
===========================================

Discover the tiered storage capability in Aiven for Apache Kafka®. Learn how it works and explore its use cases. Check why you might need it and what benefits you get using it.
Discover how tiered storage works in Aiven for Apache Kafka®, explore its use cases, and learn why you might need it and what benefits it offers.



.. tableofcontents::

2 changes: 1 addition & 1 deletion docs/products/kafka/concepts/tiered-storage-guarantees.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ With Aiven for Apache Kafka®'s tiered storage, there are two primary types of d
Example
--------

Let's say you have a topic with a total retention threshold of 1000 GB and a local retention threshold of 200 GB. This means that:
Let's say you have a topic with a **total retention threshold** of **1000 GB** and a **local retention threshold** of **200 GB**. This means that:

* All data for the topic will be retained, regardless of whether it is stored locally or remotely, as long as the total size of the data does not exceed 1000 GB.
* If tiered storage is enabled per topic, older segments will be uploaded immediately to remote storage, irrespective of whether the local retention threshold of 200 GB is exceeded. Data will be deleted from local storage only after it has been safely transferred to remote storage.
Expand Down
4 changes: 4 additions & 0 deletions docs/products/kafka/concepts/tiered-storage-how-it-works.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,9 @@ Local vs. remote data retention
---------------------------------
When tiered storage is enabled, data is initially stored on the local disk of the Kafka broker. Data is then asynchronously transferred to remote storage based on the pre-defined local retention threshold. During periods of high data ingestion or transient errors, such as network connectivity issues, the local storage might temporarily hold more data than specified by the local retention threshold.

.. image:: /images/products/kafka/tiered-storage/data-retention.png
:alt: Diagram depicting the concept of local vs. remote data retention in a tiered storage system.

Segment management
-------------------
Data is organized into segments, which are uploaded to remote storage individually. The active (newest) segment remains in local storage, which means that the segment size can also influence local data retention. For instance, if the local retention threshold is 1 GB, but the segment size is 2 GB, the local storage will exceed the 1 GB limit until the active segment is rolled over and uploaded to remote storage.
Expand All @@ -25,6 +28,7 @@ Data is organized into segments, which are uploaded to remote storage individual
Asynchronous uploads and replication
--------------------------------------
Data is transferred to remote storage asynchronously and does not interfere with the producer activity. While the broker aims to move data as swiftly as possible, certain conditions, such as high-throughput or connectivity issues, may cause more data to be stored in the local storage than the specified local retention policy.

Any data exceeding the local retention threshold will not be purged by the log cleaner until it is successfully uploaded to remote storage.
The replication factor is not considered during the upload process, and only one copy of each segment is uploaded to the remote storage. Most remote storage options have their own measures, including data replication, to ensure data durability.

Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 1b4b610

Please sign in to comment.