First aid for your bloated disk. Scan, clean, backup, and restore files matching a pattern or age.
This script will help you find and delete files matching a pattern or age and optionally backup & restore to AWS S3. This is a work in progress, but it's functional. It was built to run on a drive that is at 100% capacity, so it won't attempt to store any information on the disk it's cleaning.
More features are coming, so be sure to check out the roadmap and drop me a note if you have any suggestions.
Pre-requisite: Make sure you install awscli and configure your credentials if you plan on using s3.
- Clone the repo
- Link the script to your path
$ ln -s /path/to/archive_manager.py /usr/local/bin/archive_manager
- Make the script executable
$ chmod +x /path/to/archive_manager.py
- If you plan on using s3, install boto3, the only requirement.
$ pip3 install boto3
Find all files matching a pattern and see how much disk space they take up
$ archive_manager /folder/to/backup '*.jp*g' 2Y -R '*/wedding/2023/*'
Delete JPEG files older than 2 years
$ archive_manager /folder/to/backup '*.jp*g' 2Y --destroy
Delete files older than 2 month that match a regex and backup to s3 but don't delete
$ archive_manager ./big_directory '*.zip' 2M --bucket my-bucket --backup
Restore files from s3 to a local directory
$ archive_manager /folder/to/backup '*.jp*g' 2Y --bucket my-bucket --restore
- Add tests and refactor into clean code
- Create Pipy package
- Add asyncio to speed up s3 operations
- Ability to create/upack tar ball chunks of files to reduce s3 operations
- Support for Iceberg storage.
Pull requests are very welcome! Please note: I had to write this for python 3.4. That will change soon, but for now, just understand that's why I'm using the old string formatting.