This repository implements a converter for GNU ddrescue's mapfile. In short, the mapfile is a log of the results of trying to read data regions from a storage medium into a disk image file. This repository translates that log file into a DFXML document, using the byte_runs
list of a diskimageobject
element to represent regions that were and were not copied into the disk image file.
The primary script supplied by this repository is gddrescue_mapfile_to_dfxml.py
.
Development and testing is example-driven. The mapfile parsing was written based on available samples, some of which are provided as test data.
The views and opinions expressed in this project are those of the authors and do not necessarily reflect the official policy or position of any agency of the U.S. government. Any mention of a vendor or product is not an endorsement or recommendation. Logos and trademarks are copyright their respective owners.
This repository is scoped to supporting the following general form of workflow, which starts with a disk image and ends with a report of how damaged regions of the original disk affect the imaged file system.
- Starting with a disk, take an image with
ddrescue
. This produces a disk image, which may be incomplete, and a mapfile reporting imaging status. - Run
gddrescue_mapfile_to_dfxml.py
on the mapfile from step 1. This emits a DFXML file that only summarizes the disk image's geometry, without delving into file systems. - A simpler HTML report counting the lost bytes can be generated from the DFXML of step 2, by running
report_file_recoverability_html.py
.
The above gives a simple summary for when a full file system analysis is not performed.
- Analyze the file systems of the disk image with a file system parser. However this process is done, the output would need to be a DFXML file that records file objects with byte runs (denoting addresses of the file's contents on the disk).
- Run
make_file_recoverability_dfxml.py
on the DFXML files from steps 2 and 4. This emits a DFXML file that only records the files that are not fully readable due to the partial disk imaging. - Run
report_file_recoverability_html.py
on the DFXML file from step 5. This will produce an HTML report, an expanded version of what can be made in Step 3.
This figure illustrates the data flow in the above workflow:
As part of the tests in this project, example data is provided for some scenarios, and results computed and stored for viewing. Links to the example results from the workflow above are listed here as well. (For viewing in a browser, the reports are provided by conversion with pandoc
.)
- Multisession CD-ROM - This test demonstrates a CD-ROM with audio data, which has known partial-acquisition quirks when some disk imaging tools that work on hard drives are used.
- Step 1's mapfile
- Step 2's disk image DFXML
- Step 3's disk image report
- Damaged disk image - This test hand-codes a file system that references regions that are in, and not in, a hand-coded disk image map.
- Step 1's mapfile
- Step 2's disk image DFXML
- Step 3's disk image report
- Step 4's file system DFXML, created by a script that hard-coded file information
- Step 5's recoverability DFXML
- Step 6's recoverability report
Tests are run with make check
, due to use of file-based test workflows and pre-computed results.
Other Makefile targets are also provided for development convenience:
make download
- Retrieve network-dependent files. After this is run once, the rest of this repository can be used offline.make docs
- Generate documentation. Requirespandoc
to translate HTML reports to Github-flavored Markdown. This step should not be necessary unless Markdown is desired -- the HTML reports are generated bymake check
as part of testing the report-generating script.make check-docs
- (For Github repository maintenance.) Confirm computed documentation matches Git-tracked documentation.
This project currently depends on some publicly available, though not necessarily stable, implementations.
This repository currently relies on extensions to DFXML 1.2.0, listed in Schema Issue #36. Those are being used in a development branch of DFXML, add_disk_image_object_fixity_fields
.
This repository also has two scripts and a library that may be migrated into the DFXML code base. They are in this repository at the moment because this is (to the author's knowledge) the only code base generating DFXML files for the purpose of analyzing disk imaging errors and their impact.
src/intact_byte_run_index.py
- a class for comparing file geometry (byte run locations) with acquired disk image geometry.src/make_file_recoverability_dfxml.py
- a script that identifies files not fully present in an acquired disk image. Emits a manifest of files as DFXML.src/report_file_recoverability_html.py
- a script that
When a DFXML generator of the same scope as gddrescue_mapfile_to_dfxml.py
is written for another tool, the above scripts are likely to migrate.
Where files are versioned, this project follows Semantic Versioning 2.0.0.