Skip to content
This repository has been archived by the owner on Jan 29, 2024. It is now read-only.

Commit

Permalink
Merge pull request #1914 from aiven/dorota-clickhouse-tiered-storage-…
Browse files Browse the repository at this point in the history
…docs

clickhouse: tiered storage docs
  • Loading branch information
aris-aiven authored Sep 19, 2023
2 parents 068cf5c + 75a8cc3 commit 888193d
Show file tree
Hide file tree
Showing 9 changed files with 454 additions and 0 deletions.
13 changes: 13 additions & 0 deletions _toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -728,6 +728,8 @@ entries:
title: Strings
- file: docs/products/clickhouse/concepts/federated-queries
title: Federated queries
- file: docs/products/clickhouse/concepts/clickhouse-tiered-storage
title: Tiered storage
- file: docs/products/clickhouse/howto
title: HowTo
entries:
Expand Down Expand Up @@ -791,6 +793,17 @@ entries:
title: Connect services via integration databases
- file: docs/products/clickhouse/howto/connect-with-jdbc
title: Connect to external DBs with JDBC
- file: docs/products/clickhouse/howto/list-tiered-storage
title: Tiered storage
entries:
- file: docs/products/clickhouse/howto/enable-tiered-storage
title: Enable tiered storage
- file: docs/products/clickhouse/howto/configure-tiered-storage
title: Configure tiered storage
- file: docs/products/clickhouse/howto/check-data-tiered-storage
title: Check tiered storage status
- file: docs/products/clickhouse/howto/transfer-data-tiered-storage
title: Transfer data in tiered storage
- file: docs/products/clickhouse/reference
title: Reference
entries:
Expand Down
4 changes: 4 additions & 0 deletions docs/products/clickhouse/concepts.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,10 @@ Aiven service
:shadow: md
:margin: 2 2 0 0

.. grid-item-card:: :doc:`Tiered storage in Aiven for ClickHouse® </docs/products/clickhouse/concepts/clickhouse-tiered-storage>`
:shadow: md
:margin: 2 2 0 0

General
-------

Expand Down
83 changes: 83 additions & 0 deletions docs/products/clickhouse/concepts/clickhouse-tiered-storage.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
Tiered storage in Aiven for ClickHouse®
=======================================

.. important::

Aiven for ClickHouse® tiered storage is a :doc:`limited availability feature </docs/platform/concepts/beta_services>`. If you're interested in trying out this feature, contact the sales team at `[email protected] <mailto:[email protected]>`_.

Discover the tiered storage capability in Aiven for ClickHouse®. Learn how it works and explore its use cases. Check out why you might need it and what benefits you get using it.

Overview
--------

The tiered storage feature introduces a method of organizing and storing data in two tiers for improved efficiency and cost optimization. The data is automatically moved to an appropriate tier based on your database's local disk usage. On top of this default data allocation mechanism, you can control the tier your data is stored in using custom data retention periods.

The tiered storage in Aiven for ClickHouse consists of the following two layers:

SSD - the first tier
Fast storage device with limited capacity, better suited for fresh and frequently queried data, relatively costly to use

Object storage - the second tier
Affordable storage device with unlimited capability, better suited for historical and more rarely queried data, relatively slower

Why use it
----------

By :doc:`enabling </docs/products/clickhouse/howto/enable-tiered-storage>` and properly :doc:`configuring </docs/products/clickhouse/howto/configure-tiered-storage>` the tiered storage feature in Aiven for ClickHouse, you can use storage resources efficiently and, therefore, significantly reduce storage costs of your Aiven for ClickHouse instance.

How it works
------------

After you :doc:`enable </docs/products/clickhouse/howto/enable-tiered-storage>` the tiered storage feature, Aiven for ClickHouse by default stores data on SSD until it reaches 80% of its capacity. After exceeding this size-based threshold, data is stored in object storage.

Optionally, you can :doc:`configure the time-based threshold </docs/products/clickhouse/howto/configure-tiered-storage>` for your storage. Based on the time-based threshold, the data is moved from your SSD to object storage after a specified time period.

.. mermaid::

sequenceDiagram
Application->>+SSD: writing data
SSD->>Object storage: moving data based <br> on storage policies
par Application to SSD
Application-->>SSD: querying data
and Application to Object storage
Application-->>Object storage: querying data
end
alt if stored in Object storage
Object storage->>Application: reading data
else if stored in SSD
SSD->>Application: reading data
end

.. note::

Backups are taken for data that resides both on SSD and in object storage.

Typical use case
----------------

In your Aiven for ClickHouse service, there is a significant amount of data that is there for a while and is hardly ever accessed. It's stored on SSD and, thus, high-priced. You decide to :doc:`enable </docs/products/clickhouse/howto/enable-tiered-storage>` tiered storage for your service to make your data storage more efficient and reduce the costs. For that purpose, you contact the sales team at `[email protected] <mailto:[email protected]>`_ to have it enabled on your project, and you :doc:`enable </docs/products/clickhouse/howto/enable-tiered-storage>` the feature on tables you want to optimize. You :doc:`configure </docs/products/clickhouse/howto/configure-tiered-storage>` the time-based threshold to control how your data is stored between the two layers.

.. _tiered-storage-limitations:

Limitations
-----------

* When :doc:`enabled </docs/products/clickhouse/howto/enable-tiered-storage>`, the tiered storage feature cannot be deactivated.

.. tip::

As a workaround, you can create a new table (without enabling tiered storage on it) and copy the data from the original table (with the tiered storage feature :doc:`enabled </docs/products/clickhouse/howto/enable-tiered-storage>`) to the new table. As soon as the data is copied to the new table, you can remove the original table.

* With the tiered storage feature :doc:`enabled </docs/products/clickhouse/howto/enable-tiered-storage>`, it's not possible to connect to an external existing object storage or cloud storage bucket.

What's next
-----------

* :doc:`Enable tiered storage in Aiven for ClickHouse </docs/products/clickhouse/howto/enable-tiered-storage>`
* :doc:`Configure data retention thresholds for tiered storage </docs/products/clickhouse/howto/configure-tiered-storage>`

Related reading
---------------

* :doc:`Check data volume distribution between different disks </docs/products/clickhouse/howto/check-data-tiered-storage>`
* :doc:`Transfer data between SSD and object storage </docs/products/clickhouse/howto/transfer-data-tiered-storage>`
7 changes: 7 additions & 0 deletions docs/products/clickhouse/howto.rst
Original file line number Diff line number Diff line change
Expand Up @@ -41,3 +41,10 @@ Aiven for ClickHouse® how-tos
- :doc:`Connect to external database via JDBC </docs/products/clickhouse/howto/connect-with-jdbc>`
- :doc:`Manage Aiven for ClickHouse® data service integrations </docs/products/clickhouse/howto/data-service-integration>`
- :doc:`Manage Aiven for ClickHouse® integration databases </docs/products/clickhouse/howto/integration-databases>`

.. dropdown:: Tiered storage

- :doc:`Enable tiered storage in Aiven for ClickHouse® </docs/products/clickhouse/howto/enable-tiered-storage>`
- :doc:`Configure tiered storage in Aiven for ClickHouse® </docs/products/clickhouse/howto/configure-tiered-storage>`
- :doc:`Check data distribution in tiered storage for Aiven for ClickHouse® </docs/products/clickhouse/howto/check-data-tiered-storage>`
- :doc:`Transfer data between storage devices in Aiven for ClickHouse® </docs/products/clickhouse/howto/transfer-data-tiered-storage>`
80 changes: 80 additions & 0 deletions docs/products/clickhouse/howto/check-data-tiered-storage.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
Check data distribution between storage devices in Aiven for ClickHouse®'s tiered storage
=========================================================================================

.. important::

Aiven for ClickHouse® tiered storage is a :doc:`limited availability feature </docs/platform/concepts/beta_services>`. If you're interested in trying out this feature, contact the sales team at `[email protected] <mailto:[email protected]>`_.

Monitor how your data is distributed between the two layers of your tiered storage: SSD and object storage.

About checking data distribution
--------------------------------

If you have the tiered storage feature :doc:`enabled </docs/products/clickhouse/howto/enable-tiered-storage>` on your project, your data in Aiven for ClickHouse is distributed between two storage devices (tiers). You can check on what storage devices your databases and tables are stored. You can also preview their total sizes as well as part counts, minimum part sizes, median part sizes, and maximum part sizes.

Prerequisites
-------------

* Access to `Aiven Console <https://console.aiven.io/>`_
* Tiered storage feature :doc:`enabled </docs/products/clickhouse/howto/enable-tiered-storage>`
* Command line tool (:doc:`ClickHouse client </docs/products/clickhouse/howto/connect-with-clickhouse-cli>`)

Check data distribution in Aiven Console
----------------------------------------

You can use `Aiven Console <https://console.aiven.io/>`_ to check if tiered storage is enabled on your service and, if it is, how much storage is used on each tier (local SSD and remote object storage) for particular tables.

To access tiered storage's status information, go to `Aiven Console <https://console.aiven.io/>`_ > your Aiven for ClickHouse service > the **Databases and tables** page > your database > your table > **View details** > **Storage details**.

Run a data distribution check with the ClickHouse client (CLI)
--------------------------------------------------------------

1. :doc:`Connect to your Aiven for ClickHouse service </docs/products/clickhouse/howto/list-connect-to-service>` using, for example, the ClickHouse client (CLI).
2. Run the following query:

.. code-block:: bash
SELECT
database,
table,
disk_name,
formatReadableSize(sum(data_compressed_bytes)) AS total_size,
count(*) AS parts_count,
formatReadableSize(min(data_compressed_bytes)) AS min_part_size,
formatReadableSize(median(data_compressed_bytes)) AS median_part_size,
formatReadableSize(max(data_compressed_bytes)) AS max_part_size
FROM system.parts
GROUP BY
database,
table,
disk_name
ORDER BY
database ASC,
table ASC,
disk_name ASC
You can expect to receive the following output:

.. code-block:: bash
┌─database─┬─table─────┬─disk_name─┬─total_size─┬─parts_count─┬─min_part_size─┬─median_part_size─┬─max_part_size─┐
│ datasets │ hits_v1 │ default │ 1.20 GiB │ 6 │ 33.65 MiB │ 238.69 MiB │ 253.18 MiB │
│ datasets │ visits_v1 │ S3 │ 536.69 MiB │ 5 │ 44.61 MiB │ 57.90 MiB │ 317.19 MiB │
│ system │ query_log │ default │ 75.85 MiB │ 102 │ 7.51 KiB │ 12.36 KiB │ 1.55 MiB │
└──────────┴───────────┴───────────┴────────────┴─────────────┴───────────────┴──────────────────┴───────────────┘
.. topic:: Result

The query returns a table with data distribution details for all databases and tables that belong to your service: the storage device they use, their total sizes as well as parts counts and sizing.

What's next
-----------

* :doc:`Transfer data between SSD and object storage </docs/products/clickhouse/howto/transfer-data-tiered-storage>`
* :doc:`Configure data retention thresholds for tiered storage </docs/products/clickhouse/howto/configure-tiered-storage>`

Related reading
---------------

* :doc:`About tiered storage in Aiven for ClickHouse </docs/products/clickhouse/concepts/clickhouse-tiered-storage>`
* :doc:`Enable tiered storage in Aiven for ClickHouse </docs/products/clickhouse/howto/enable-tiered-storage>`
94 changes: 94 additions & 0 deletions docs/products/clickhouse/howto/configure-tiered-storage.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
Configure data retention thresholds in Aiven for ClickHouse®'s tiered storage
=============================================================================

.. important::

Aiven for ClickHouse® tiered storage is a :doc:`limited availability feature </docs/platform/concepts/beta_services>`. If you're interested in trying out this feature, contact the sales team at `[email protected] <mailto:[email protected]>`_.

Learn to control how your data is distributed between storage devices in the tiered storage of an Aiven for ClickHouse service. Check out how to configure tables so that your data is automatically written either to SSD or object storage as needed.

About data retention control
----------------------------

If you have the tiered storage feature :doc:`enabled </docs/products/clickhouse/howto/enable-tiered-storage>` on your Aiven for ClickHouse service, your data is distributed between two storage devices (tiers). The data is stored either on SSD or in object storage, depending on whether and how you configure this behavior. By default, data is moved from SSD to object storage when SSD reaches 80% of its capacity (default size-based data retention policy).

You may want to change this default data distribution behavior by :ref:`configuring your table's schema by adding a TTL (time-to-live) clause <time-based-retention-config>`. Such a configuration allows ignoring the SSD-capacity threshold and moving the data from SSD to object storage based on how long the data is there on your SSD.

To enable this time-based data distribution mechanism, you can set up a retention policy (threshold) on a table level by using the TTL clause. For data retention control purposes, the TTL clause uses the following:

* Data item of the `Date` or `DateTime` type as a reference point in time
* INTERVAL clause as a time period to elapse between the reference point and the data transfer to object storage

Prerequisites
-------------

* Aiven organization
* Tiered storage feature :doc:`enabled </docs/products/clickhouse/howto/enable-tiered-storage>` on the project level and on the table level
* Command line tool (:doc:`ClickHouse client </docs/products/clickhouse/howto/connect-with-clickhouse-cli>`)

.. _time-based-retention-config:

Configure time-based data retention
-----------------------------------

1. :doc:`Connect to your Aiven for ClickHouse service </docs/products/clickhouse/howto/list-connect-to-service>` using, for example, the ClickHouse client (CLI).
2. Select a database for operations you intend to perform.

.. code-block:: bash
USE database-name
Add TTL to a new table
''''''''''''''''''''''

Create a new table with the ``storage_policy`` setting set to ``tiered`` (to :doc:`enable </docs/products/clickhouse/howto/enable-tiered-storage>` the feature) and TTL (time-to-live) configured to add a time-based data retention threshold on the table.

.. code-block:: shell
CREATE TABLE example_table (
SearchDate Date,
SearchID UInt64,
SearchPhrase String
)
ENGINE = MergeTree
ORDER BY (SearchDate, SearchID)
PARTITION BY toYYYYMM(SearchDate)
TTL SearchDate + INTERVAL 1 WEEK TO VOLUME 'tiered'
SETTINGS storage_policy = 'tiered';
Add TTL to an existing table
''''''''''''''''''''''''''''

Use the MODIFY TTL clause:

.. code-block:: shell
ALTER TABLE database_name.table_name MODIFY TTL ttl_expression;
Update TTL to an existing table
'''''''''''''''''''''''''''''''

Change an already configured TTL in an existing table by using the ALTER TABLE MODIFY TTL clause:

.. code-block:: shell
ALTER TABLE database_name.table_name MODIFY TTL ttl_expression;
.. topic:: Result

You have your time-based data retention policy set up. From now on, when data is on your SSD longer than a specified time period, it's moved to object storage, regardless of how much of SSD capacity is still available.

What's next
-----------

* :doc:`Check data volume distribution between different disks </docs/products/clickhouse/howto/check-data-tiered-storage>`

Related reading
---------------

* :doc:`About tiered storage in Aiven for ClickHouse </docs/products/clickhouse/concepts/clickhouse-tiered-storage>`
* :doc:`Enable tiered storage in Aiven for ClickHouse </docs/products/clickhouse/howto/enable-tiered-storage>`
* :doc:`Transfer data between SSD and object storage </docs/products/clickhouse/howto/transfer-data-tiered-storage>`
* `Manage Data with TTL (Time-to-live) <https://clickhouse.com/docs/en/guides/developer/ttl>`_
* `Create table statement, TTL documentation <https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/mergetree#mergetree-table-ttl>`_
* `MergeTree - column TTL <https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/mergetree#mergetree-column-ttl>`_
Loading

0 comments on commit 888193d

Please sign in to comment.