-
Notifications
You must be signed in to change notification settings - Fork 307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add differential/incremental backup script with service splitting #640
base: master
Are you sure you want to change the base?
Conversation
Added backup and restore script, also backup list
Script name had incorrect ending
Perhaps look at IOTstackBackup. Still "full" backups but it doesn't need the stack to be taken down to run the backup. |
scripts/diffinc_restore.sh
Outdated
fi | ||
|
||
#backup validated as far as possible, confirm user intent | ||
echo "Do you wish to continue restoring from backup? This will delete the present data! (y/n)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should move the current data to a .old
directory? Not a requirement though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented it as a user selectable option in cd86f9c because it may become an issue when dealing with limited disk space.
As @Paraphraser we should try to avoid bringing the stack down during backups (especially if they are triggered by a cronjob). But services like InfluxDB and Postgres cannot be just "copied" when they are running, as the backed up version may have corrupted data. These services usually have their own internal backup command that can be executed from the host machine to produce a safe backup. |
Thank you @Paraphraser for the suggestion, there are some very interesting ideas in the repo. My issue with it is that is heavily relies on IOTstack specific assumtions. If e.g. you run another service that has a mariadb container as backend it will potentially get corrupted. I am also not sure if running a backup of, say, nodered while changes are made would result in a backup the user would expect. I think having the choice of incremental backups is very interesting when running nextcloud as it can quickly become very time and disk space consuming. It took 15min/45min to create an uncompressed/compressed ~24GB backup on a RPi4. When thinking about 1 TB and above doing regular backups can become disruptive. |
I might be misreading your response but I'm left feeling that you might've misunderstood my intention when I pointed you at IOTstackBackup. I wasn't trying to say either "don't submit PR" or that there was anything wrong with your approach. Neither was I suggesting that IOTstackBackup already does what you're proposing (it doesn't). I just wanted to make sure you were aware of the existence of IOTstackBackup in case you wanted to borrow anything from its approach. No more, no less.
Yes. But that's true of all scripts on this repo or any of the satellite repos around IOTstack. We're focused on the problem at hand. No apologies for that.
I assume you mean the situation like NextCloud where there's a dedicated instance of MariaDB, rather than the situation where container X just happens to use the MariaDB instance you get if you select MariaDB in the menu. If I was a user of IOTstackBackup, I'd either fork the IOTstackBackup repo and add my own custom script to deal with that NextCloud-like situation, or open an issue or propose a PR for IOTstackBackup to deal with the situation. As the maintainer, if I see another service definition get added to IOTstack that includes a dedicated MariaDB instance, I'll react to that. On the other hand, if you mean arbitrary container X using the MariaDB instance you get if you select MariaDB in the menu, that doesn't pose a problem. X is just a user of the service. More on this below.
Flows are just JSON files so edits are either in memory (and won't be seen by a concurrent backup) or are flushed to disk on a "Deploy" (and will be seen). If a flow writes to an SQLite database stored inside the Node-RED persistent store, that doesn't matter because SQLite databases are already copy-safe. If a flow writes to a database in another container, that's either an instance of something where there's an existing solution (InfluxDB or MariaDB) or a case on the to-do list (like PostgreSQL where the only safe approach at the moment is to down the stack).
In practice, no difference. As far as I'm aware, if you stop any database engine then it should be safe to copy its persistent store. The engine just has to be stopped for the duration of the copy. When you invoke an engine's internal backup function, you're asking the engine to provide you with a snapshot where it (the engine) takes responsibility for assuring that what is backed-up is both coherent and restorable. My understanding is that the basic mechanism is to wrap the backup request into a transaction via which mutual exclusion is assured. It's a read-only operation so other read-only operations can proceed in parallel while writes will be queued. But I don't believe it's as simplistic as "all writes queued for the duration of the backup". It's writes that might affect what the backup request has already read. Something like that. Bottom line: I've never seen a write rejected because of contention with a backup so, in practice, backups cause zero interference.
Most RDBMS of my experience write backups as the SQL commands needed to recreate the schema and import the data. The files can be massive but they compress well. Influx is different in that it exports its shards (or, at least, that's what it looks like to me). My own Influx databases are the better part of 5 years old. My busiest database is ingesting a new row every 10 seconds and is coming up to 15 million rows. Others are far less busy, typically acquiring a row every 5 minutes. The "raw" size of the persistent store is a bit under 1GB. The time for So, yes, I "get" that this is large but not huge. And I also "get" that, at some point, it will make sense to investigate the Influx internal backup mechanism's ability to produce incremental backups. I just haven't had the need to do that. Yet.
A simple, uneqivocal "Yes!"
I'm not sure I can give an answer that will actually address what is probably your underlying question. The best I can come up with is the "just following orders" excuse of what you see in In principle, I agree that there is little difference between terminating the container, grabbing the persistent store, and starting the container again - vs - putting the container into maintenance mode, telling the database engine to take a self-backup. then grabbing that plus the rest of the persistent store, and taking the container out of maintenance mode. Given the overall size of the NextCloud persistent store, any actual timing difference between down/up and maintenance mode on/off likely just disappears into the woodwork. Probably the most that can be said is that at least NextCloud will tell the user that it is in maintenance mode - if it's down, there's nothing to provide that response.
Yes. Agreed. On the topic of NextCloud, a lot of things about it bug me - to the point where, having considered it as something I might use, I decided it was a waste of space. I mean that literally. A clean install with default apps consumes 1.3GB - before I add a single byte of my own data. I look at things like that and it fair screams "bloatware". I could not see why so much of what had come down from the web on first install could not come down again, automatically, on a "bare metal" restore of a backup and actually had to be included in the backup. Ideally, it should only be my data that gets backed up, not a significant chunk of stuff that comes down from the web. That made me question whether its internal self-repair was up to snuff. The whole thing gave me the heebies so I decided it was a headache I really didn't need. I realise that isn't really much of an answer either. But - and this is just thinking out loud - it may well be the case that you could get the best of both worlds. Putting NextCloud in maintenance mode stops it from changing its persistent store. I can imagine doing that, making an incremental backup, then out of maintenance mode. Although, Googling the topic of whether MariaDB can or can't take its own incremental backups gets conflicting info (unlike InfluxDB where it's clear that it can) so that would have to be sorted out. There's also no reason why tar and zip steps have to occur while in maintenance mode - that's what multiple cores are for! But, as I said, just thinking out loud. To change the focus, slightly, here are some odds and ends of things that have occurred to me:
Hope this helps. |
@Paraphraser Perhaps we were talking past each other, no antagonism intended. At first I thought I write a fast script for myself which then ballooned which made me think it could be an addition to the project. Your scripts didn't quite fit my needs as I already modified the menu generated docker-compose.yml quite a lot. I think a (somewhat) generalized approach to backup production not only helps advanced users, but also eases implementation of new services and removes error sources.
Perhaps the synthesis for an optimal generalized script looks something like this:
That would be quite a different beast than the current script and I am not sure I can deal with that in a reasonable time (sleep(?); familyMembers++; ). I'll leave the PR up and the admins should decide if it is still a useful addition. I intend to address the trailing points in the post in upcoming commits and, if merged, in wiki edits. |
- added support for SCP/RSYNC transfer of backup files after backup and restoring by transferring backups from other machines via RSYNC - backup file was updated to include .env file and wildcarding docker-compose and compose-override file extension - backup folder will be chowned to the user if -u flag is applied
RSYNC and SCP functionality for exporting and importing backups have been added which should allow 3-2-1 backups out of the box. I don't have dropbox, so testing rclone would be a bit more work. If there is interest I can do the work. |
The current backup script always creates full dumps that are temporarily copied, hence taking down the stack for a potentially long time and using a lot of disk space. This backup script aims to reduce stack down time for most services while also reducing disk space requirements.
This is accomplished by allowing the user to chose incremental/differential backups instead of full dumps and to make separate backup files for individual services which allows bringing up the other services in the meantime. The disk overhead is only one differential backup. Different usage scenarios for automated backups are described in the backup script file.
Script was tested in mock up folder structure and on a live system, further review and validation highly welcome to ensure data integrity for users!