-
Notifications
You must be signed in to change notification settings - Fork 1
Home
Marc Zimmermann edited this page Apr 20, 2020
·
1 revision
The goal of this project is to have a simple and easy command line tool a researcher can use to archive files at the end of a project. An archive is generated for compliance reasons. It is then kept for at least 10 years. It may be moved to colder (i.e cheaper) storage and may be duplicated to 2 (or more) locations. Feature Overview (order roughly reflects priority):
- create "archive package": compression, splitting up large files, generate metadata (file listing), generate file hashes (e.g. MD5)
- integrity checks
- easy retrieval of single files/directories out of an archive
- encryption (optional)
- tools to simplify/streamlining archive (optional), ideas:
- README generation (provide template, attempt to determine start and end year of project)
- verify dir structure, suggestion to delete obsolete, unnecessary files (like temporary, backup files)
- check all files readable by person responsible for archival
- deduplication of archives, i.e. remove large files or directories common in two or more archives (only if we can save TBs...) A special focus lies on maintenance, as a long life cycle of the tool is expected.