Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Video hashs on vastly different videos yield is_similar() True #100

Open
christopherwingert opened this issue Dec 13, 2022 · 1 comment
Open

Comments

@christopherwingert
Copy link

Would modifying similar_percentage help? If so, which direction should I go?

@96jaco96
Copy link

I'm having this problem too, and i've spend all day today debugging why does this happen.

So far i've discovered this:

the "is_similar" function in videohash.py do this check:

if self - other <= ceil((self.similar_percentage / 100) * self.bits_in_hash)

BUT videohash.py also defines these two things:
self.bits_in_hash = 64
self.similar_percentage = 15

so the previous check ALWAYS boils down to:
if self - other <= ceil((15 / 100) * 64)
which is ALWAYS = 10

basically changing the "is_similar" function from
if self - other <= ceil((self.similar_percentage / 100) * self.bits_in_hash)
to
if self - other <= 10
returns the same results, and i've tested this with a semple of 1000 videos.
The results are identical both with the default check and when using "if self - other <= 10"

Correct me if i'm wrong, i'm quite noob-ish here and just doing some observations... infact i'm not even sure mathematically speaking what this check is doing exactly.

ALSO i think this can be related somewhat to the issue #94 "Hash Collision" if that might help...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants