You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm experimenting with Fashion CLIP and noticed my zero-shot classification scores were lower when using the in-built zero_shot_classification(images, text_labels) method compared to the scores I got by first calculating the embeddings, then similarities and finally predictions step by step.
What I've found is that in the _cosine_similarity(key_vectors, space_vectors, normalize) method, only the key_vectors (corresponding to image embeddings) are being normalized, so it's not really calculating the cosine similarity (as both vectors need to be normalized) and it's degrading performance.
The text was updated successfully, but these errors were encountered:
I'm experimenting with Fashion CLIP and noticed my zero-shot classification scores were lower when using the in-built
zero_shot_classification(images, text_labels)
method compared to the scores I got by first calculating the embeddings, then similarities and finally predictions step by step.What I've found is that in the
_cosine_similarity(key_vectors, space_vectors, normalize)
method, only thekey_vectors
(corresponding to image embeddings) are being normalized, so it's not really calculating the cosine similarity (as both vectors need to be normalized) and it's degrading performance.The text was updated successfully, but these errors were encountered: