Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Florence2 VS BLIP2 #61

Closed
aliencaocao opened this issue Oct 31, 2024 · 10 comments
Closed

Florence2 VS BLIP2 #61

aliencaocao opened this issue Oct 31, 2024 · 10 comments

Comments

@aliencaocao
Copy link
Contributor

Is there any benchmark/comparsion between the 2 models you released? I cannot find any info regarding florence 2 in your paper. The inference cost differs quite signifcantly.

@yadong-lu
Copy link
Collaborator

florence 2 is trained after we put the paper on arxiv. We don't have benchmark numbers yet, but from examples I have tried so far, it seems to be at least comparable to (if not better than) blip2.

@aliencaocao
Copy link
Contributor Author

Hm thats interesting since it is such a small model. I tried one example and it doesn't seem to do better. For example, it would caption
Image
As "a medal showing a high ranking"
While BLIP2 will more correctly say "a medal showing a 3rd place ranking".

Would love if the team could eval the model, or release the evaluation dataset so we can eval ourselves.

@aliencaocao
Copy link
Contributor Author

It also tend to produce the same gibberish output:

Icon Box ID 68: M0,0L9,0 4.5,5z
Icon Box ID 69: M0,0L9,0 4.5,5z
Icon Box ID 70: deleting or removing an item.
Icon Box ID 71: deleting or removing an item.
Icon Box ID 72: M0,0L9,0 4.5,5z
Icon Box ID 73: a progress bar.
Icon Box ID 74: a progress bar or loading indicator.
Icon Box ID 75: Maximize
Icon Box ID 76: a horizontal scroll bar.
Icon Box ID 77: deleting or removing an item.
Icon Box ID 78: A battery charging indicator.
Icon Box ID 79: M0,0L9,0 4.5,5z
Icon Box ID 80: a progress bar.
Icon Box ID 81: M0,0L9,0 4.5,5z

Notice the M0,0L9,0 4.5,5z
Image

@yadong-lu
Copy link
Collaborator

interesting, can you post a link to the original screenshot?

@aliencaocao
Copy link
Contributor Author

aliencaocao commented Nov 1, 2024

Image
This can be repro on your HF spaces demo. Most icons are false positives (I think your yolo conf thresh is too low, 0.25 works btr for me), but just looking at the correct ones (e.g. 69, 81, where its a drop-down arrow button)

@yadong-lu
Copy link
Collaborator

yadong-lu commented Nov 1, 2024

@aliencaocao I noticed some issue with the demo on huggingface space, fixing it now.

Update: ok seems to be some transient issue. Resumed the demo now.

@abrichr
Copy link

abrichr commented Nov 1, 2024

Edit: issue persists in #52.

@aliencaocao Thank you for your feedback.

@aliencaocao
Copy link
Contributor Author

@abrichr please stop posting your PR everywhere on unrelated issues. Unless you prove that your code somehow do not produce gibberish using the Florence2 model on my image, you are not contributing to the discussion here. This isn't the first time you have posted something irrelevant in other people's issues.

@aliencaocao
Copy link
Contributor Author

Update: ok seems to be some transient issue. Resumed the demo now.

@yadong-lu So can I assume the gibberish with Florence is normal, and just caused by limited model capacity, not some implementation bug?

@yadong-lu
Copy link
Collaborator

yadong-lu commented Nov 1, 2024

@aliencaocao When the icon detected are neither text nor app icons, I think florence has some issue caption it.
But please let me know if you observe other gibberish output when the region is indeed text or app icons.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants