-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
squashfs: fix file reading #197
Conversation
… for decompression
…eros This is as specifed in the specification.
- make sure we don't overflow int when using large files - fix reading 4x too few blocks - correct inverted logic on whether a fragment exists or not - pass the calculated block sizes into the fetching routine
- Fix array bounds errors by taking more care reading the correct data out of the buffers - Fix potential 32/64 bit problems - Fix off by one termination condition in loop - Error if a fragment is expected but we don't have one (shouldn't happen) - Error if we ever read no data (shouldn't happen)
Before this change it was accidentally using the size of the basic file inode as the offset of the blocklist.
Update all the tests are fixed now - turned out to be a very simple fix. This could be merged now so I'll unset the draft. |
Do you mind describing in prose here what the issue was? I am reading it in the code, but a description here would be very helpful. |
Also, is the logic in the comments right before the changes in |
I noticed when testing the code that it had trouble reading most files. In fact the only files it could read reliably were very short ones stored entirely in the fragments or files which were an exact multiple of the block size. These happen to be the type of files in the I decided to try to fix this. The first thing I did was make a new This let me track down the problems with the code. I made one commit per fix to try and keep the code reviewable. commit e608eb1
This was just a copy paste bug I think. the commit 13ce837
This commit isn't strictly speaking necessary - it doesn't fix anything, but it makes it return a sensible error when things aren't as expected which was the case for quite a lot of the time when debugging. commit d105557
This is as recommended by the go docs. This doesn't fix anything, it was just something I noticed while reviewing the code. commit 11f9a4c
This was the big fix. The main problem here was that this code gave array index errors if you tried to read something which wasn't an exact block size. It went through quite a few iterations trying to get all the test cases to pass but I think it is correct now 🤞 The main fix was to be very careful with exactly how much data was read from the input block. The fragment reading needed the same logic so I factored it into a closure. This introduces the I also added some defensive programming which was used when trying to make it work before I realised that the blocklist was actually wrong (see above) and it was being passed corrupted data. I fixed up some 32/64 bit problems to make sure that the code would work on architectures where commit 5bb1d38
This was the other very important fix There were two main things wrong here. The number of blocks was being calculated wrong (the fragment exists logic was back to front) and this was using byte indexing when it should have been using I also fixed up some 32/64 bit problems to make sure that the code would work on architectures where commit 96796b1
This is mentioned in the spec - a zero sized block should be treated as block_size worth of zeroes.
commit 4968323
I noticed that it was using the wrong blocksize when using an external file. It needs to use the blocksize from the squashfs header not the one the user passes in otherwise everything goes wrong!
Yes it still works exactly like that. For a future improvement it would be sensible to cache the last uncompressed block in the file handle as unless you are reading at exactly |
Yeah, that would be good... for a future one. I think this one covers enough. Thank you for the detailed explanation. Based on how you described it, it looks like each commit actually could stand alone? If each were a separate PR, then the mainline branch would be valid at each given commit? No, I don't want separate PRs at this point, but if they really do stand alone, no point in doing a squash-and-merge, as I usually do. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took me quite a bit to understand it (but a lot less than understanding the innards of squashfs 😆 ), but I see it now.
Yes I'll do that as a followup. I think it will double the performance for file reading for relatively little cost (a small buffer per open file).
Yes each commit does stand alone. I tried to make the commits individually reviewable as I know how hard reviewing code is. They could be individual PRs but if you were to merge without squashing that would have the same effect. (I developed this patch series in a completely different order and when I was done fixing stuff I separated them out into logically separate fixes!).
:-) I can't say I understand the innards of squashfs completely, but I know more about it now than I did! |
Well, if you feel like learning more about ext4 and inode trees, the help would be appreciated. ext4 implementation has been slowed down a lot for a long time. Largely blocked on some of the inode tree rebalancing. I am sure there is more there. One of these lifetimes I will get someone who wants it enough to pay for a few months of consulting and finish it. I have been tempted to ask GPT-4 as well. |
This is a series of patches to fix up the reading in squashfs.
The first introduces some new tests and then we attempt to fix the tests!
Update all tests working now!