diff --git a/docs/products/kafka/concepts/kafka-tiered-storage.rst b/docs/products/kafka/concepts/kafka-tiered-storage.rst index 2c9fd2b148..bfe606205c 100644 --- a/docs/products/kafka/concepts/kafka-tiered-storage.rst +++ b/docs/products/kafka/concepts/kafka-tiered-storage.rst @@ -5,8 +5,8 @@ Discover the tiered storage capability in Aiven for Apache Kafka®. Learn how it Overview --------- +Tiered storage in Aiven for Apache Kafka® lets you manage your data more efficiently by leveraging multiple storage types—local disk and remote cloud storage options like AWS S3 and Google Cloud Storage. This feature offers a tailored approach to data storage, allowing you to allocate frequently accessed data to high-speed local disks while offloading less critical or infrequently accessed data to more cost-effective remote storage solutions. Tiered storage enables you to indefinitely store data on specific topics without running out of space. Once enabled, it is configured per topic, giving you granular control over data storage. -Tiered storage in Aiven for Apache Kafka® lets you manage your data more efficiently by leveraging multiple storage types—local disk and remote cloud storage options like AWS S3 and Google Cloud Storage. This feature offers a tailored approach to data storage, allowing you to allocate frequently accessed data to high-speed local disks while offloading less critical or infrequently accessed data to more cost-effective remote storage solutions. Tiered storage allows you to store data on specific topics indefinitely without running out of space. Once enabled, it is configured on a per-topic basis, giving you granular control over data storage. .. note:: Azure blob storage is not yet supported for tiered storage in Aiven for Apache Kafka. diff --git a/docs/products/kafka/concepts/tiered-storage-how-it-works.rst b/docs/products/kafka/concepts/tiered-storage-how-it-works.rst index d59fdc1c1a..26f11ff35c 100644 --- a/docs/products/kafka/concepts/tiered-storage-how-it-works.rst +++ b/docs/products/kafka/concepts/tiered-storage-how-it-works.rst @@ -6,16 +6,15 @@ Aiven for Apache Kafka® tiered storage is a feature that optimizes data managem * **Local tier**: Primarily consists of faster and typically more expensive storage solutions like solid-state drives (SSDs). * **Remote tier**: Relies on slower, cost-effective options like cloud object storage. -In Aiven for Apache Kafka's tiered storage architecture, **remote storage** refers to storage options external to the Kafka broker's local disk. This typically includes cloud-based or self-hosted object storage solutions like AWS S3, Google Cloud, and Azure Blob Storage. Although network-attached block storage solutions like AWS EBS are technically external to the broker machine, Apache Kafka considers them local storage within its tiered storage architecture. +In Aiven for Apache Kafka's tiered storage architecture, **remote storage** refers to storage options external to the Kafka broker's local disk. This typically includes cloud-based or self-hosted object storage solutions like AWS S3 and Google Cloud. Although network-attached block storage solutions like AWS EBS are technically external to the broker machine, Apache Kafka considers them local storage within its tiered storage architecture. Tiered storage operates in a way that is seamless for both Apache Kafka producers and consumers. This means that producers and consumers interact with Apache Kafka in the same way, regardless of whether tiered storage is enabled or not. -Administrators can configure Tiered storage per topic by defining the retention period and retention bytes to specify how much data should be retained on the local disk as opposed to remote storage. +Administrators can configure Tiered storage per topic by defining the retention period and retention bytes to specify how much data should be retained on the local disk instead of remote storage. Local vs. remote data retention --------------------------------- - When tiered storage is enabled, data is initially stored on the local disk of the Kafka broker. Data is then asynchronously transferred to remote storage based on the pre-defined local retention threshold. During periods of high data ingestion or transient errors, such as network connectivity issues, the local storage might temporarily hold more data than specified by the local retention threshold. Segment management @@ -29,10 +28,10 @@ Data is transferred to remote storage asynchronously and does not interfere with Any data exceeding the local retention threshold will not be purged by the log cleaner until it is successfully uploaded to remote storage. The replication factor is not considered during the upload process, and only one copy of each segment is uploaded to the remote storage. Most remote storage options have their own measures, including data replication, to ensure data durability. + Data retrieval ----------------- -When consumers fetch records stored in remote storage, the broker downloads and caches these records locally. This allows for quicker access in subsequent retrieval operations. -The retention time and the maximum size of the cache can be configured. +When consumers fetch records stored in remote storage, the broker downloads and caches these records locally. This allows for quicker access in subsequent retrieval operations. You can configure the retention time and the maximum size of the cache. diff --git a/docs/products/kafka/concepts/tiered-storage-limitations.rst b/docs/products/kafka/concepts/tiered-storage-limitations.rst index a5ebb63ca7..7d4f005b99 100644 --- a/docs/products/kafka/concepts/tiered-storage-limitations.rst +++ b/docs/products/kafka/concepts/tiered-storage-limitations.rst @@ -1,7 +1,7 @@ Trade-offs and limitations ============================ -The main trade-off of tiered storage in Aiven for Apache Kafka® is the higher latency while accessing and reading data from remote storage compared to local disk storage. While adding local caching can partially solve this problem, it cannot eliminate the latency. +The main trade-off of tiered storage is the higher latency while accessing and reading data from remote storage compared to local disk storage. While adding local caching can partially solve this problem, it cannot eliminate the latency completely. Limitations -------------