Skip to content
Marc Zimmermann edited this page Apr 20, 2020 · 1 revision

Project Description

The goal of this project is to have a simple and easy command line tool a researcher can use to archive files at the end of a project. An archive is generated for compliance reasons. It is then kept for at least 10 years. It may be moved to colder (i.e cheaper) storage and may be duplicated to 2 (or more) locations. Feature Overview (order roughly reflects priority):

  • create "archive package": compression, splitting up large files, generate metadata (file listing), generate file hashes (e.g. MD5)
  • integrity checks
  • easy retrieval of single files/directories out of an archive
  • encryption (optional)
  • tools to simplify/streamlining archive (optional), ideas:
    • README generation (provide template, attempt to determine start and end year of project)
    • verify dir structure, suggestion to delete obsolete, unnecessary files (like temporary, backup files)
      • check all files readable by person responsible for archival
    • deduplication of archives, i.e. remove large files or directories common in two or more archives (only if we can save TBs...) A special focus lies on maintenance, as a long life cycle of the tool is expected.
Clone this wiki locally