Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hashing speed issue. #96

Open
Demmenie opened this issue Oct 17, 2022 · 7 comments
Open

Hashing speed issue. #96

Demmenie opened this issue Oct 17, 2022 · 7 comments
Labels
bug Something isn't working

Comments

@Demmenie
Copy link

Describe the bug
It takes quite a while to hash a video.

To Reproduce

from videohash import VideoHash
import time

start = time.time()

url = 'https://user-images.githubusercontent.com/47534140/185008752-da1f09c7-a177-4a46-9c64-230744e998c1.mp4'
v1 = VideoHash(url=url, frame_interval=12)

print(f"Finished in {time.time() - start} secs")

Expected behavior
It should realistically be doable in under a second

Please complete the following information:

  • Operating system: Windows 10
  • Python Version: 3.10.2
  • VideoHash version: 2.1.9

Additional context
Currently takes about 3/4 seconds

@Demmenie Demmenie added the bug Something isn't working label Oct 17, 2022
@96jaco96
Copy link

96jaco96 commented Oct 6, 2023

any updates on this? it's still so slow it's not even funny.

and i don't think that the problem is "unoptimized" code, but i think it's the logic behind it, on how it was conceived in first place...

i mean look at the definition of how it calculates the hashes:

"Every one second, a frame from the input video is extracted, the frames are shrunk to a 144x144 pixel square, a collage is constructed that contains all of the resized frames(square-shaped), the collage's wavelet hash's bit-list is the first bit-list that we use. The frames extracted are now stitched horizontally to each other, and finally divided into 64 equal sized images, the domiant color of these 64 images are detected and compared with a pre-defined pattern of dominant colors, if they match the bit is set else unset. So now we have two bitlist, finally we bitwise XOR these two bitlists. The XOR'ed output is used to generate the final 64 bit hash-value for the video. The bits are joined to form the 64 bit hash-value of the input value."

This process is bound to take a long time, and if this is the way that it calculates hashes then i don't think there's too much we can do to speed it up....

but maybe someone that's smarter than me can answer this better?

@Pigglebear
Copy link

Pigglebear commented Feb 25, 2024

I played around with logging when the function calls happen, and the vast majority of the time is taken up in detecting the cropping/black bars.

Edit: I was able to speed it up ~10x for short videos by passing the output for video_duration() to framesextractor() and stopping it from looking for cropping in timestamps past the end of the video.

@albertopasqualetto
Copy link

Edit: I was able to speed it up ~10x for short videos by passing the output for video_duration() to framesextractor() and stopping it from looking for cropping in timestamps past the end of the video.

I think you could do PR for it.. even if the author will never merge it.

@Demmenie
Copy link
Author

I think you could do PR for it.. even if the author will never merge it.

Please do, if only to let other people do the same. I think the author does update it but only infrequently.

@Demmenie
Copy link
Author

Demmenie commented Jul 5, 2024

Edit: I was able to speed it up ~10x for short videos by passing the output for video_duration() to framesextractor() and stopping it from looking for cropping in timestamps past the end of the video.

I implemented this in my fork and it was about twice as fast on average, counting both long and short videos:
https://github.com/Demmenie/videohash2
https://pypi.org/project/videohash2/

@albertopasqualetto
Copy link

@Demmenie Wow, very good job!

@Demmenie
Copy link
Author

Thank you, I'm currently working on some other features and hopefully I can find some other ways of speeding it up more

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants