Skip to content

Commit

Permalink
zip: find preambles for zip files written by archive/zip
Browse files Browse the repository at this point in the history
zip files written by `archive/zip`'s `Writer` correctly report the byte
offsets of their central directory file header and end of central
directory entries, but this means the offset of the zip data within the
file as calculated by `archive/zip`'s `Reader` will always be 0 - in
other words, `Reader` thinks that any non-zip data prepended to the file
is actually part of the first file's local file header. This prevents
any zip file written by `archive/zip` from being used as a source for
the `zip` command's `--preamble_from` option, because the non-zip data
doesn't appear to be non-zip data at all.

In cases where `Reader` identifies that the byte offset of the zip data
within the file is 0, check whether the byte offset of the header for
the first local file header is also 0. If it isn't, assume the byte
offset of the first local file header is the true starting position of
the zip data within the file, and that anything before it is in fact a
preamble.
  • Loading branch information
chrisnovakovic committed Oct 24, 2024
1 parent 541008d commit 9d97356
Show file tree
Hide file tree
Showing 4 changed files with 16 additions and 0 deletions.
8 changes: 8 additions & 0 deletions zip/preamble.go
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,14 @@ func Preamble(path string) (*PreambleReader, error) {
// when the Reader is created, so this is the fastest (and easiest) way to find out where the zip
// data begins in the underlying file.
zipOffset := reflect.ValueOf(zr).Elem().FieldByName("baseOffset").Int()
// zip files written by archive/zip's Writer correctly report the byte offsets of their CDFH and
// EOCD entries, but this means the baseOffset calculated by the Reader will always be 0, even when
// non-zip data is prepended. We can detect this based on the reported byte offset of the zip
// header for the first file in the archive - for files that truly contain only zip data, this
// should also be 0. If it isn't, assume everything before the first file header is the preamble.
if zipOffset == 0 && len(zr.File) != 0 {
zipOffset = reflect.ValueOf(zr.File[0]).Elem().FieldByName("headerOffset").Int()
}
log.Debugf("%s: zip data begins at byte offset %d", path, zipOffset)
zr.Close()
f, err := os.Open(path)
Expand Down
8 changes: 8 additions & 0 deletions zip/preamble_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,14 @@ func TestPreamble(t *testing.T) {
ZipFile: "zip_preamble.zip",
PreambleChecksum: "038a57f3f807fa91bdd30239b9711fccf0d782fe2f036e03211852237e94d24c", // another.zip
},
{
ZipFile: "arcat_preamble.zip",
PreambleChecksum: "46533b2dfa35ad537d3561ebee0c7af8941bc65363c1b188e1be6eaf79e9138c", // shebang_twolines.txt
},
{
ZipFile: "empty.zip",
PreambleChecksum: "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855", // empty string
},
} {
t.Run(tc.ZipFile, func(t *testing.T) {
r := require.New(t)
Expand Down
Binary file added zip/test_data_4/arcat_preamble.zip
Binary file not shown.
Binary file added zip/test_data_4/empty.zip
Binary file not shown.

0 comments on commit 9d97356

Please sign in to comment.