description |
---|
Step-by-step guide to set up ClickHouse as a destination in RudderStack |
ClickHouse is an open-source, column-oriented database management system mainly used for online analytical processing (OLAP). It is fast, and allows for real-time analysis of your data. ClickHouse was originally developed by Yandex, the Russian IT company, for the Yandex Metrica web analytics service. ClickHouse is highly scalable, and built for high performance.
You can now integrate RudderStack with ClickHouse seamlessly and store your customer event data into it, from the source of your choice.
{% hint style="info" %} Please check our Warehouse Schemas guide to know how events are mapped to the tables in ClickHouse. {% endhint %}
{% hint style="success" %} Find the open-source transformer code for this destination in our GitHub repo. {% endhint %}
You to need to set date_time_input_format
to best_effort
for ClickHouse to parse all the ISO 8601 date and time formats. For example, '2018-06-08T01:02:03.000Z'
.
set date_time_input_format = 'best_effort';
{% hint style="info" %}
We use UInt8
datatype to set Boolean values and map UInt8
to Boolean internally. So when we fetch schema from ClickHouse, we treat UInt8
as Boolean.
{% endhint %}
{% hint style="info" %}
Make sure your ClickHouse User has read access to system.columns
table for fetching table schema's
{% endhint %}
{% hint style="warning" %}
If you are creating tables in the same database where RudderStack loads, we highly recommend not using UInt8
as a datatype except for boolean values (0,1).
{% endhint %}
You will need to whitelist the RudderStack IPs to enable network access to it.
{% hint style="info" %}
The IPs to be whitelisted are : 3.216.35.97
, 34.198.90.241
, 54.147.40.62
, 23.20.96.9
, and 18.214.35.254
.
{% endhint %}
In order to enable dumping data to ClickHouse, you will first need to add it as a destination to the source from which you are sending the event data. Once the destination is enabled, events from RudderStack will automatically start to flow to ClickHouse.
To configure ClickHouse as a destination, please follow these steps:
- Choose a source to which you would like to add ClickHouse as a destination. You can also simply create a destination and connect it to a source later.
{% hint style="info" %} Please follow our Adding a Source and Destination guide to know how to add a source in RudderStack. {% endhint %}
- After choosing a source, select ClickHouse from the list of destinations.
- Give your destination a name and then click on Next. You should then see the following screen:
- Add the required credentials in the Connection Settings as mentioned below:
- Host - ****The host name of your ClickHouse Database.
- Port - The TCP port of your ClickHouse Host. ex: 9000. if you want the connection to be secure use TCP secure port 9440. reference
- Database - The database name in your ClickHouse instance where the data gets loaded.
- Cluster - The name of your ClickHouse cluster Please note: Leave this blank if you are running a single host ClickHouse Cluster
- User - The username which has the required read/write access to the above database.
- Password - The password for the above user.
- Secure - To establish secure connection.
- Skip verify - To skip certificate verification. if you are using self-signed certificates, set this field
true
- CA certificate - Certificate which needs to be verified while establishing a secure connection. Add the certificate contents to the text input
- Bucket Provider - Intermediate storage for storing staging files. Currently, we support S3, MinIO, Google Cloud Storage, and Azure Blob Storage.
We create tables with engine ReplacingMergeTree order by (received_at, id)
and column dataType
as Nullable(dataType)
.ReplacingMergeTree
replaces the latest event which has the same received_at, id
while merging. Nullable
is not applicable for sortKeys
.
For the user's table, we create a table with an engine AggregatingMergeTree
order by id
and column dataType
as SimpleAggregateFunction(anyLast, Nullable(dataType))
. Merging the columns with the sameid
picks the last value which is not null. Nullable
is not applicable for sortKeys
.
RudderStack converts the event keys into the lower case before exporting the data into ClickHouse. This is so that it does not create two tables in case the event name has two different cases.
If you come across any issues while configuring ClickHouse with RudderStack, please feel free to contact us or start a conversation on our Slack channel. We will be happy to help you.