Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

squashfs: fix received X bytes instead of expected minimal 40 on listing #201

Merged
merged 2 commits into from
Dec 31, 2023

Conversation

ncw
Copy link
Contributor

@ncw ncw commented Dec 18, 2023

When unpacking large squashfs archives you occasionally get this
error:

received 38 bytes instead of expected minimal 40

This is due to the inode changing type (ie it is actual an extended
inode) when read from the directory but the size not being updated.
Most of the time there is enough data for the larger inode but not
always.


I couldn't think of a way to test this.

I have an archive which demonstrates the problem you can download here: tensorflow.sqfs if you need it.

It was made with this script

#! /usr/bin/bash
ORG=${ORG:-tensorflow}
IMG=${IMG:-tensorflow}
TAG=${TAG:-latest-gpu-jupyter}
docker export $(docker create $ORG/$IMG:$TAG) -o $IMG.tar.gz
mkdir -p $IMG && tar xf $IMG.tar.gz -C $IMG
[ -f $IMG.sqfs ] && rm $IMG.sqfs
mksquashfs $IMG $IMG.sqfs  -comp zstd -Xcompression-level 3 -b 1M -no-xattrs -all-root

@deitch
Copy link
Collaborator

deitch commented Dec 19, 2023

The following is the current process:

  1. call getInode(), passing it the inode type
  2. get the size from the passed type, uncompress it, get the header
  3. If the actual type is not the expected type, use the actual type

I think this change of yours says:

  1. OK, but if the actual type is different, then not only should the type be different, but the size should be different, too, so reread the inode using the newly-discovered size from the actual header.

Is that correct?

I couldn't think of a way to test this.
I have an archive
It was made with this script

Something in the directory structure must have triggered it. What exactly was it? Could we isolate that?

@ncw
Copy link
Contributor Author

ncw commented Dec 23, 2023

Is that correct?

Yes that is correct, though in step 4 we may have read enough data already. I'll see if I can create a squashfs which causes the problem - watch this space!

@ncw
Copy link
Contributor Author

ncw commented Dec 27, 2023

I managed to make this squashfs which demonstrates the problem.

tensorflow-nolinks.sqfs.zip

(I had to zip it for upload).

It is the tensorflow docker image but with all the files truncated to 0 size and all the symlinks removed.

It is 816K so not too big!

I could write a test which does a directory traversal of it and check that we can see all the files.

What do you think? A good enough test?

I didn't manage to make a synthetic test!

@deitch
Copy link
Collaborator

deitch commented Dec 28, 2023

It is 816K so not too big!

It isn't, although I suspect we could make it smaller. The real question is what is triggering it.

Wouldn't a simple image with just one file (and therefore 1 inode, or maybe 2 if you count the directory), but where we call getInode() with the basic type work?

I am trying to understand how this is triggered. getInode() is called only from 3 places:

  • Read(), where it reads the filesystem from a file, and where it assumes that the root is a basic directory. Can the root be an extended directory? If so, that is a simple replication.
  • getDirectoryEntries(), where it reads the inode for each directory entry, taking the inode type from the directory entry. I cannot see why the type in the entry would be incorrect, but I guess that is possible.
  • hydrateDirectoryEntries(), where it gets the inode from the passed []*directoryEntryRaw, which are taken from the actual directory entries.

When you run into this problem with your sample above? What actual path is it taking? I assume it is not the first, but the second or third? What is the actual case triggering it?

@ncw ncw force-pushed the fix-large-squashfs branch from 7b0779d to fb28d9d Compare December 30, 2023 17:21
@ncw
Copy link
Contributor Author

ncw commented Dec 30, 2023

Wouldn't a simple image with just one file (and therefore 1 inode, or maybe 2 if you count the directory), but where we call getInode() with the basic type work?

I managed to make a test image eventually! I think it only triggers when the directory reading is

  • at the end of the page
  • reads an extended type

I managed to get this to trigger by using 4k page sizes and 300 files with xattrs (to force them to be extended types).

I've added this to the commit.

In the process I discovered another bug! There was a small typo in the xattr parsing which I've also fixed - the new test doesn't pass without the fix.

@deitch
Copy link
Collaborator

deitch commented Dec 31, 2023

I managed to get this to trigger by using 4k page sizes and 300 files with xattrs (to force them to be extended types).

All about the right combination to trigger the problem. Thank you, indeed, for tracking it down.

You have done so much for this PR, I hate to ask for more. Can I ask to expand the comment in buildtestsqs.sh to explain the corner case we are dealing with? No other requests, then we can merge this in.

ncw added 2 commits December 31, 2023 17:07
When unpacking large squashfs archives you occasionally get this
error:

    received 38 bytes instead of expected minimal 40

This is due to the inode changing type (ie it is actual an extended
inode) when read from the directory but the size not being updated.
Most of the time there is enough data for the larger inode but not
always.
Before this fix the code crashed with errors like

    panic: runtime error: slice bounds out of range [282:270] [recovered]

When reading xattrs.

This was caused by a mixup of indices in the code. This fix is tested
by TestSquashfsReadDirCornerCases and causes it to run clean.
@ncw ncw force-pushed the fix-large-squashfs branch from fb28d9d to fe4709c Compare December 31, 2023 17:08
@ncw
Copy link
Contributor Author

ncw commented Dec 31, 2023

You have done so much for this PR, I hate to ask for more. Can I ask to expand the comment in buildtestsqs.sh to explain the corner case we are dealing with? No other requests, then we can merge this in.

I've done that now :-) Let me know if you need any other changes.

@deitch deitch merged commit b20cf01 into diskfs:master Dec 31, 2023
19 checks passed
@deitch
Copy link
Collaborator

deitch commented Dec 31, 2023

Nope, looks great. Thank you!

@ncw ncw deleted the fix-large-squashfs branch January 3, 2024 15:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants