Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2024-08-05 Tiny file fix and Confidence correction #97

Conversation

NebularNerd
Copy link
Contributor

@NebularNerd NebularNerd commented Aug 5, 2024

Closes #96
Closes #86

Tiny File Issue

Puremagic is having trouble with super small files of less than 1k when opened as a stream, as the matches became better an issue has cropped up with footer matches trying to go into negative numbers. My previous PR #87 added a match for .dmg files which starts at -512 bytes from the end, which when the file is only a couple of hundred bytes made Puremagic very sad.

This led to the errors shown in both the above issues, this fix adds a try/except into the stream logic, if the max_foot length exceeds the file size, then simply set it to the file size. I can't see there being an issue with this as it just results in footer being the whole file. While it could cause issues if you had a multi-GB file and some absurd even larger multi-GB negative match the probabilities of that are near zero.

Confidence fix

In a previous PR I submitted (#66) I dropped the ball and did not set the reverse flag to False, this meant that the matches were not being sorted correctly. The longest matches were first but in the wrong order e.g:

Results from my script which just give me a tidied version of the output.

Most likely match:
Format:        MPEG-1 Audio Layer 3 (MP3) ID3v2.3.0 audio file
Confidence:    80.0%
Extension:     .mp3
MIME:          audio/mpeg
Offset:        -128
Bytes Matched: b'ID3\x03\x00TAG'
Hex:           4944 3303 0054 4147
String:        ID3TAG

Alternate match #1
Format:        MPEG-1 Audio Layer 3 (MP3) ID3v2.3.0 audio file
Confidence:    80.0%
Extension:     .mp3
MIME:          audio/mpeg
Offset:        10
Bytes Matched: b'ID3\x03\x00\x00\x00\x01*XMCDI'
Hex:           4944 3303 0000 0001 2a58 4d43 4449
String:        ID3*XMCDI

Should be:

Most likely match:
Format:        MPEG-1 Audio Layer 3 (MP3) ID3v2.3.0 audio file
Confidence:    80.0%
Extension:     .mp3
MIME:          audio/mpeg
Offset:        10
Bytes Matched: b'ID3\x03\x00\x00\x00\x01 \x1eMCDI'
Hex:           4944 3303 0000 0001 201e 4d43 4449
String:        ID3 MCDI

Alternate match #1
Format:        MPEG-1 Audio Layer 3 (MP3) ID3v2.3.0 audio file
Confidence:    80.0%
Extension:     .mp3
MIME:          audio/mpeg
Offset:        -128
Bytes Matched: b'ID3\x03\x00TAG'
Hex:           4944 3303 0054 4147
String:        ID3TAG

Sorry for breaking things by making them better...
a3241c1c-4aa9-44ec-b573-865ec7edcd8d_text

Change line 229 to allow for super tiny files
Change line 136 to FALSE to correct sorting order
Fix lint issue
@NebularNerd
Copy link
Contributor Author

NebularNerd commented Aug 5, 2024

@cdgriffith I'm getting build errors which look to be to do with pytest, is my change breaking that or something else?

EDIT
Ah, it seems my confidence fix is breaking the pytest.....

0c5ac1315cd51d14fbaa81c928ad6a21

Let me try and fix that....

@NebularNerd
Copy link
Contributor Author

OK fixed the issue, sorry for the dozen commits, I seem to not get on well with Black and Lint stylings (I need to really understand all this better for the future)

The fix is a bit of a hack, its sole task is to allow pytest to pass, if there is a better way to fix this then that's cool. I've added a test that looks to see if pytest is running, if so, leave reverse set to True so the imghdr tests get the result they expect. If running under normal operation set to False and allow for the best Real World result to be shown.

I tried putting this outside the def so it only ran on calling the puremagic.py but it failed so for now it checks every time.

puremagic/main.py Outdated Show resolved Hide resolved
Copy link
Contributor Author

@NebularNerd NebularNerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the more pythonic way of testing for PYTEST_CURRENT_TEST I tested on my local copy and that works fine.

I can't approve it but @cdgriffith will be able to if the PR gets merged.

@cdgriffith
Copy link
Owner

I think I see the issue, it's not the reverse flag, it's that we were comparing the second option to the values of itself not it's length.

Fix:

    return sorted(results, key=lambda x: (x.confidence, len(x.byte_match)), reverse=True)

@cdgriffith
Copy link
Owner

Nice find and fix on the small files!

puremagic/main.py Outdated Show resolved Hide resolved
@cdgriffith cdgriffith merged commit e80d58a into cdgriffith:develop Aug 7, 2024
9 checks passed
@NebularNerd
Copy link
Contributor Author

Nice find and fix on the small files!

It was helpful that @FelipeLema had the same issue, I thought it was just due to some wonky issue with my test .aif files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants