This repository has been archived by the owner on Jan 29, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 51
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #1914 from aiven/dorota-clickhouse-tiered-storage-…
…docs clickhouse: tiered storage docs
- Loading branch information
Showing
9 changed files
with
454 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
83 changes: 83 additions & 0 deletions
83
docs/products/clickhouse/concepts/clickhouse-tiered-storage.rst
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,83 @@ | ||
Tiered storage in Aiven for ClickHouse® | ||
======================================= | ||
|
||
.. important:: | ||
|
||
Aiven for ClickHouse® tiered storage is a :doc:`limited availability feature </docs/platform/concepts/beta_services>`. If you're interested in trying out this feature, contact the sales team at `[email protected] <mailto:[email protected]>`_. | ||
|
||
Discover the tiered storage capability in Aiven for ClickHouse®. Learn how it works and explore its use cases. Check out why you might need it and what benefits you get using it. | ||
|
||
Overview | ||
-------- | ||
|
||
The tiered storage feature introduces a method of organizing and storing data in two tiers for improved efficiency and cost optimization. The data is automatically moved to an appropriate tier based on your database's local disk usage. On top of this default data allocation mechanism, you can control the tier your data is stored in using custom data retention periods. | ||
|
||
The tiered storage in Aiven for ClickHouse consists of the following two layers: | ||
|
||
SSD - the first tier | ||
Fast storage device with limited capacity, better suited for fresh and frequently queried data, relatively costly to use | ||
|
||
Object storage - the second tier | ||
Affordable storage device with unlimited capability, better suited for historical and more rarely queried data, relatively slower | ||
|
||
Why use it | ||
---------- | ||
|
||
By :doc:`enabling </docs/products/clickhouse/howto/enable-tiered-storage>` and properly :doc:`configuring </docs/products/clickhouse/howto/configure-tiered-storage>` the tiered storage feature in Aiven for ClickHouse, you can use storage resources efficiently and, therefore, significantly reduce storage costs of your Aiven for ClickHouse instance. | ||
|
||
How it works | ||
------------ | ||
|
||
After you :doc:`enable </docs/products/clickhouse/howto/enable-tiered-storage>` the tiered storage feature, Aiven for ClickHouse by default stores data on SSD until it reaches 80% of its capacity. After exceeding this size-based threshold, data is stored in object storage. | ||
|
||
Optionally, you can :doc:`configure the time-based threshold </docs/products/clickhouse/howto/configure-tiered-storage>` for your storage. Based on the time-based threshold, the data is moved from your SSD to object storage after a specified time period. | ||
|
||
.. mermaid:: | ||
|
||
sequenceDiagram | ||
Application->>+SSD: writing data | ||
SSD->>Object storage: moving data based <br> on storage policies | ||
par Application to SSD | ||
Application-->>SSD: querying data | ||
and Application to Object storage | ||
Application-->>Object storage: querying data | ||
end | ||
alt if stored in Object storage | ||
Object storage->>Application: reading data | ||
else if stored in SSD | ||
SSD->>Application: reading data | ||
end | ||
|
||
.. note:: | ||
|
||
Backups are taken for data that resides both on SSD and in object storage. | ||
|
||
Typical use case | ||
---------------- | ||
|
||
In your Aiven for ClickHouse service, there is a significant amount of data that is there for a while and is hardly ever accessed. It's stored on SSD and, thus, high-priced. You decide to :doc:`enable </docs/products/clickhouse/howto/enable-tiered-storage>` tiered storage for your service to make your data storage more efficient and reduce the costs. For that purpose, you contact the sales team at `[email protected] <mailto:[email protected]>`_ to have it enabled on your project, and you :doc:`enable </docs/products/clickhouse/howto/enable-tiered-storage>` the feature on tables you want to optimize. You :doc:`configure </docs/products/clickhouse/howto/configure-tiered-storage>` the time-based threshold to control how your data is stored between the two layers. | ||
|
||
.. _tiered-storage-limitations: | ||
|
||
Limitations | ||
----------- | ||
|
||
* When :doc:`enabled </docs/products/clickhouse/howto/enable-tiered-storage>`, the tiered storage feature cannot be deactivated. | ||
|
||
.. tip:: | ||
|
||
As a workaround, you can create a new table (without enabling tiered storage on it) and copy the data from the original table (with the tiered storage feature :doc:`enabled </docs/products/clickhouse/howto/enable-tiered-storage>`) to the new table. As soon as the data is copied to the new table, you can remove the original table. | ||
|
||
* With the tiered storage feature :doc:`enabled </docs/products/clickhouse/howto/enable-tiered-storage>`, it's not possible to connect to an external existing object storage or cloud storage bucket. | ||
|
||
What's next | ||
----------- | ||
|
||
* :doc:`Enable tiered storage in Aiven for ClickHouse </docs/products/clickhouse/howto/enable-tiered-storage>` | ||
* :doc:`Configure data retention thresholds for tiered storage </docs/products/clickhouse/howto/configure-tiered-storage>` | ||
|
||
Related reading | ||
--------------- | ||
|
||
* :doc:`Check data volume distribution between different disks </docs/products/clickhouse/howto/check-data-tiered-storage>` | ||
* :doc:`Transfer data between SSD and object storage </docs/products/clickhouse/howto/transfer-data-tiered-storage>` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
80 changes: 80 additions & 0 deletions
80
docs/products/clickhouse/howto/check-data-tiered-storage.rst
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,80 @@ | ||
Check data distribution between storage devices in Aiven for ClickHouse®'s tiered storage | ||
========================================================================================= | ||
|
||
.. important:: | ||
|
||
Aiven for ClickHouse® tiered storage is a :doc:`limited availability feature </docs/platform/concepts/beta_services>`. If you're interested in trying out this feature, contact the sales team at `[email protected] <mailto:[email protected]>`_. | ||
|
||
Monitor how your data is distributed between the two layers of your tiered storage: SSD and object storage. | ||
|
||
About checking data distribution | ||
-------------------------------- | ||
|
||
If you have the tiered storage feature :doc:`enabled </docs/products/clickhouse/howto/enable-tiered-storage>` on your project, your data in Aiven for ClickHouse is distributed between two storage devices (tiers). You can check on what storage devices your databases and tables are stored. You can also preview their total sizes as well as part counts, minimum part sizes, median part sizes, and maximum part sizes. | ||
|
||
Prerequisites | ||
------------- | ||
|
||
* Access to `Aiven Console <https://console.aiven.io/>`_ | ||
* Tiered storage feature :doc:`enabled </docs/products/clickhouse/howto/enable-tiered-storage>` | ||
* Command line tool (:doc:`ClickHouse client </docs/products/clickhouse/howto/connect-with-clickhouse-cli>`) | ||
|
||
Check data distribution in Aiven Console | ||
---------------------------------------- | ||
|
||
You can use `Aiven Console <https://console.aiven.io/>`_ to check if tiered storage is enabled on your service and, if it is, how much storage is used on each tier (local SSD and remote object storage) for particular tables. | ||
|
||
To access tiered storage's status information, go to `Aiven Console <https://console.aiven.io/>`_ > your Aiven for ClickHouse service > the **Databases and tables** page > your database > your table > **View details** > **Storage details**. | ||
|
||
Run a data distribution check with the ClickHouse client (CLI) | ||
-------------------------------------------------------------- | ||
|
||
1. :doc:`Connect to your Aiven for ClickHouse service </docs/products/clickhouse/howto/list-connect-to-service>` using, for example, the ClickHouse client (CLI). | ||
2. Run the following query: | ||
|
||
.. code-block:: bash | ||
SELECT | ||
database, | ||
table, | ||
disk_name, | ||
formatReadableSize(sum(data_compressed_bytes)) AS total_size, | ||
count(*) AS parts_count, | ||
formatReadableSize(min(data_compressed_bytes)) AS min_part_size, | ||
formatReadableSize(median(data_compressed_bytes)) AS median_part_size, | ||
formatReadableSize(max(data_compressed_bytes)) AS max_part_size | ||
FROM system.parts | ||
GROUP BY | ||
database, | ||
table, | ||
disk_name | ||
ORDER BY | ||
database ASC, | ||
table ASC, | ||
disk_name ASC | ||
You can expect to receive the following output: | ||
|
||
.. code-block:: bash | ||
┌─database─┬─table─────┬─disk_name─┬─total_size─┬─parts_count─┬─min_part_size─┬─median_part_size─┬─max_part_size─┐ | ||
│ datasets │ hits_v1 │ default │ 1.20 GiB │ 6 │ 33.65 MiB │ 238.69 MiB │ 253.18 MiB │ | ||
│ datasets │ visits_v1 │ S3 │ 536.69 MiB │ 5 │ 44.61 MiB │ 57.90 MiB │ 317.19 MiB │ | ||
│ system │ query_log │ default │ 75.85 MiB │ 102 │ 7.51 KiB │ 12.36 KiB │ 1.55 MiB │ | ||
└──────────┴───────────┴───────────┴────────────┴─────────────┴───────────────┴──────────────────┴───────────────┘ | ||
.. topic:: Result | ||
|
||
The query returns a table with data distribution details for all databases and tables that belong to your service: the storage device they use, their total sizes as well as parts counts and sizing. | ||
|
||
What's next | ||
----------- | ||
|
||
* :doc:`Transfer data between SSD and object storage </docs/products/clickhouse/howto/transfer-data-tiered-storage>` | ||
* :doc:`Configure data retention thresholds for tiered storage </docs/products/clickhouse/howto/configure-tiered-storage>` | ||
|
||
Related reading | ||
--------------- | ||
|
||
* :doc:`About tiered storage in Aiven for ClickHouse </docs/products/clickhouse/concepts/clickhouse-tiered-storage>` | ||
* :doc:`Enable tiered storage in Aiven for ClickHouse </docs/products/clickhouse/howto/enable-tiered-storage>` |
94 changes: 94 additions & 0 deletions
94
docs/products/clickhouse/howto/configure-tiered-storage.rst
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,94 @@ | ||
Configure data retention thresholds in Aiven for ClickHouse®'s tiered storage | ||
============================================================================= | ||
|
||
.. important:: | ||
|
||
Aiven for ClickHouse® tiered storage is a :doc:`limited availability feature </docs/platform/concepts/beta_services>`. If you're interested in trying out this feature, contact the sales team at `[email protected] <mailto:[email protected]>`_. | ||
|
||
Learn to control how your data is distributed between storage devices in the tiered storage of an Aiven for ClickHouse service. Check out how to configure tables so that your data is automatically written either to SSD or object storage as needed. | ||
|
||
About data retention control | ||
---------------------------- | ||
|
||
If you have the tiered storage feature :doc:`enabled </docs/products/clickhouse/howto/enable-tiered-storage>` on your Aiven for ClickHouse service, your data is distributed between two storage devices (tiers). The data is stored either on SSD or in object storage, depending on whether and how you configure this behavior. By default, data is moved from SSD to object storage when SSD reaches 80% of its capacity (default size-based data retention policy). | ||
|
||
You may want to change this default data distribution behavior by :ref:`configuring your table's schema by adding a TTL (time-to-live) clause <time-based-retention-config>`. Such a configuration allows ignoring the SSD-capacity threshold and moving the data from SSD to object storage based on how long the data is there on your SSD. | ||
|
||
To enable this time-based data distribution mechanism, you can set up a retention policy (threshold) on a table level by using the TTL clause. For data retention control purposes, the TTL clause uses the following: | ||
|
||
* Data item of the `Date` or `DateTime` type as a reference point in time | ||
* INTERVAL clause as a time period to elapse between the reference point and the data transfer to object storage | ||
|
||
Prerequisites | ||
------------- | ||
|
||
* Aiven organization | ||
* Tiered storage feature :doc:`enabled </docs/products/clickhouse/howto/enable-tiered-storage>` on the project level and on the table level | ||
* Command line tool (:doc:`ClickHouse client </docs/products/clickhouse/howto/connect-with-clickhouse-cli>`) | ||
|
||
.. _time-based-retention-config: | ||
|
||
Configure time-based data retention | ||
----------------------------------- | ||
|
||
1. :doc:`Connect to your Aiven for ClickHouse service </docs/products/clickhouse/howto/list-connect-to-service>` using, for example, the ClickHouse client (CLI). | ||
2. Select a database for operations you intend to perform. | ||
|
||
.. code-block:: bash | ||
USE database-name | ||
Add TTL to a new table | ||
'''''''''''''''''''''' | ||
|
||
Create a new table with the ``storage_policy`` setting set to ``tiered`` (to :doc:`enable </docs/products/clickhouse/howto/enable-tiered-storage>` the feature) and TTL (time-to-live) configured to add a time-based data retention threshold on the table. | ||
|
||
.. code-block:: shell | ||
CREATE TABLE example_table ( | ||
SearchDate Date, | ||
SearchID UInt64, | ||
SearchPhrase String | ||
) | ||
ENGINE = MergeTree | ||
ORDER BY (SearchDate, SearchID) | ||
PARTITION BY toYYYYMM(SearchDate) | ||
TTL SearchDate + INTERVAL 1 WEEK TO VOLUME 'tiered' | ||
SETTINGS storage_policy = 'tiered'; | ||
Add TTL to an existing table | ||
'''''''''''''''''''''''''''' | ||
|
||
Use the MODIFY TTL clause: | ||
|
||
.. code-block:: shell | ||
ALTER TABLE database_name.table_name MODIFY TTL ttl_expression; | ||
Update TTL to an existing table | ||
''''''''''''''''''''''''''''''' | ||
|
||
Change an already configured TTL in an existing table by using the ALTER TABLE MODIFY TTL clause: | ||
|
||
.. code-block:: shell | ||
ALTER TABLE database_name.table_name MODIFY TTL ttl_expression; | ||
.. topic:: Result | ||
|
||
You have your time-based data retention policy set up. From now on, when data is on your SSD longer than a specified time period, it's moved to object storage, regardless of how much of SSD capacity is still available. | ||
|
||
What's next | ||
----------- | ||
|
||
* :doc:`Check data volume distribution between different disks </docs/products/clickhouse/howto/check-data-tiered-storage>` | ||
|
||
Related reading | ||
--------------- | ||
|
||
* :doc:`About tiered storage in Aiven for ClickHouse </docs/products/clickhouse/concepts/clickhouse-tiered-storage>` | ||
* :doc:`Enable tiered storage in Aiven for ClickHouse </docs/products/clickhouse/howto/enable-tiered-storage>` | ||
* :doc:`Transfer data between SSD and object storage </docs/products/clickhouse/howto/transfer-data-tiered-storage>` | ||
* `Manage Data with TTL (Time-to-live) <https://clickhouse.com/docs/en/guides/developer/ttl>`_ | ||
* `Create table statement, TTL documentation <https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/mergetree#mergetree-table-ttl>`_ | ||
* `MergeTree - column TTL <https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/mergetree#mergetree-column-ttl>`_ |
Oops, something went wrong.