Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

export and import #3

Open
Neamar opened this issue Apr 16, 2021 · 19 comments
Open

export and import #3

Neamar opened this issue Apr 16, 2021 · 19 comments
Labels
enhancement New feature or request

Comments

@Neamar
Copy link

Neamar commented Apr 16, 2021

Description of problem

Most dokku plugins offer a :export and :import feature that can be used to backup and restore to a new server if required.

This plugin does not offer any similar options, it would be a really nice improvement

It seems we have access to data-dir, but it's probably unsafe to rsync it's data while other processes are writing to it.

@josegonzalez
Copy link
Member

Does clickhouse have a way to do import/export? Thats what is blocking me from implementing this.

@Neamar
Copy link
Author

Neamar commented Apr 16, 2021

It's a good question, I read https://clickhouse.tech/docs/en/operations/backup/

The best, officially supported option seems to be https://clickhouse.tech/docs/en/operations/utilities/clickhouse-copier/

It's not perfect (not atomic) but should be a good start :)

@josegonzalez
Copy link
Member

We don't customize the image. Is that process available in the official clickhouse image?

@Neamar
Copy link
Author

Neamar commented Apr 19, 2021

The binary clickhouse-copier is available in the official image, however I ended up using this repo: https://github.com/AlexAkulov/clickhouse-backup which was mentioned on the official doc: https://clickhouse.tech/docs/en/operations/backup/

It was fairly simple:

dokku clickhouse:enter {{NAME}}                                                                                                                                                          
wget https://github.com/AlexAkulov/clickhouse-backup/releases/download/v0.6.4/clickhouse-backup.tar.gz                                                                                        
tar -zxvf clickhouse-backup.tar.gz
cd clickhouse-backup
CLICKHOUSE_DATA_PATH=/var/lib/clickhouse CLICKHOUSE_USERNAME={{username}} CLICKHOUSE_PASSWORD={{password}} ./clickhouse-backup create test

After that, it's simply a matter of zipping up /var/lib/clickhouse/backup (or /var/lib/dokku/services/clickhouse/{{NAME}}/data/backup on the main host).

I dunno if this can be integrated or not into this repo. I agree it is less straightforward than a simple pg_dump.

@josegonzalez
Copy link
Member

If the binary is indeed available, then it should be possible to do backups in the same way we do them for everything else.

Is there a way to import the backup?

@Neamar
Copy link
Author

Neamar commented Apr 19, 2021

I haven't looked too much into clickhouse-copier, as they mentioned that it is mostly used to replicate data from one db to another, not to generate a file. I guess we could do something like "create another db, copy data there, disconnect, zip the mounted files from the new db, drop the new db", and the opposite for export? But it seems more complex than strictly necessary, which is why I used clickhouse-backup (not copier) instead.

@josegonzalez
Copy link
Member

Could we use that docker image from the linked repo against the service? And what does the restore process look like?

@Neamar
Copy link
Author

Neamar commented Apr 19, 2021

Could we use that docker image from the linked repo against the service?

Yes, but my docker knowledge is very limited, and I wasn't sure how to resolve the hostname externally (without exposing the service through an amdassador).

Restoring should simply be ./clickhouse-backup restore test

@josegonzalez
Copy link
Member

You could start the container as a linked container?

@Neamar
Copy link
Author

Neamar commented Apr 19, 2021

Yes, that would probably work and be the most efficient way of doing the backup

@josegonzalez josegonzalez added the enhancement New feature or request label May 31, 2021
@nerg4l
Copy link
Contributor

nerg4l commented Jul 19, 2021

The 2021 roadmap for ClickHouse contains a "backup" feature. The issue which tracks this feature is ClickHouse/ClickHouse#13953

Once that story is done there will be official backup and restore commands which could be used for "export" and "import".

@josegonzalez
Copy link
Member

Okay seems like the backup was implemented. Does anyone have good ideas as to how this might be done on our end? Seems like it has to backup to a directory, but we don't currently mount any backup directories afaik...

@nerg4l
Copy link
Contributor

nerg4l commented Sep 17, 2022

My assumption is the same. A folder have to be mounted to expose backups. I will run some tests tomorrow.

@josegonzalez
Copy link
Member

We could backup to a file and then maybe cat the contents out?

@nerg4l
Copy link
Contributor

nerg4l commented Sep 17, 2022

The type of a disk under storage_configuration can be s3. I have to do some testing to see if backup and restore works with that type.

@josegonzalez
Copy link
Member

That might make our existing backup/restore code more difficult to integrate with. I'd rather not deviate from that if possible, as it makes it easier to copy/paste code across plugins.

@nerg4l
Copy link
Contributor

nerg4l commented Sep 18, 2022

I had a closer look on https://github.com/AlexAkulov/clickhouse-backup. The functionality which uses BACKUP and RESTORE is referenced as "EmbeddedBackup" in the code. During embedded backup, they build a BACKUP query with all tables. Executing that query will result in a single file. Which can be copied, or as you said outputted with cat.

@josegonzalez
Copy link
Member

Backups were implemented in the official image, so does that mean we could implement that here?

@njoguamos
Copy link

njoguamos commented Sep 21, 2024

Backing up

For anyone who might want to back up the clickhouse to S3, here is how I do it.

COMMANDS=$(cat <<'EOF'    
    echo -e '✅ Setting environment variables'
    export CLICKHOUSE_USERNAME=<replace-clikhouse-username>
    export CLICKHOUSE_PASSWORD=<replace-clikhouse-password>
    export REMOTE_STORAGE=s3
    export S3_ACCESS_KEY=<replace-access-key>
    export S3_SECRET_KEY=<repalce-secret-key>
    export S3_BUCKET=<repalce-bucket-name>
    export S3_ENDPOINT=<replace-endpoint>
    export S3_REGION=us-east-1
    
    echo "✅ Download clickhouse-backup binaries"
    wget https://github.com/Altinity/clickhouse-backup/releases/download/v2.6.1/clickhouse-backup-linux-amd64.tar.gz
    tar -zxvf clickhouse-backup-linux-amd64.tar.gz
    install -o root -g root -m 0755 build/linux/amd64/clickhouse-backup /usr/local/bin
    
    echo "✅ Creating and uploading backup"
    BACKUP_NAME=clickhouse-backup-$(date -u +%Y-%m-%dT%H-%M-%S)
    clickhouse-backup create $BACKUP_NAME --rbac --configs
    clickhouse-backup upload $BACKUP_NAME
    
    echo "✅ Exit shell. Goodbye!"
    exit
EOF
)

dokku clickhouse:enter <clickhouse-instance-name> bash -c "$COMMANDS"

Note

clickhouse-backup must be run inside the container.

Restoring

To restore, simply repeat the process but run the following toward the end.

clickhouse-backup restore_remote <replace-with-remote-backup_name> 

You can set up cronjob to run once per day. Learn more about Altinity Backup for ClickHouse.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants