Replies: 1 comment
-
Well they were both trained on LAION-2B (english), but by different people. The hparams probably differed a little, I'd say the LR likely a bit diff. The B/32 was 1e-3, the B/16 might have been 5e-4 (don't have the data for that). Also, pretty sure they were trained on different clusters which would have been different instances of LAION2B, not that far apart in time but probably have different samples, maybe were downloaded at slightly different res or with dowscale interpolation. Can't check now as thoes datasets wouldn't be there anymore. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello!
I have been doing some (more) benchmarking of the open CLIP models and noticed something that I found a bit unexpected. The average results for CLIP-ViT-B-16-laion2B-s34B-b88K are better than CLIP-ViT-B-32-laion2B-s34B-b79K, 0.59 vs 0.57 respectively (from here https://github.com/mlfoundations/open_clip/blob/main/docs/openclip_results.csv). As far as I know the datasets should be the same and the improvement is due to the difference of a B16 vs B32. However, when benchmarking these models on other datasets, I see large discrepancies to the point that B32 outperforms the B16 by 10% in some cases. Benchmarks here are related to product search and classification. I would expect this not to happen if they were trained on the same data in roughly the same way.
My questions are ;
1 - were these indeed trained on the same dataset?
2 - were there any other details from training that might explain this difference?
Thanks
Beta Was this translation helpful? Give feedback.
All reactions