E5 is a text embedding model based on Text Embeddings by Weakly-Supervised Contrastive Pre-training. Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, Furu Wei, arXiv 2022.
E5 models have the following variations:
Model | Model Size (GB) | Embedding Dimensions |
---|---|---|
intfloat/e5-large-v2 | 1.34 | 1024 |
intfloat/e5-base-v2 | 0.44 | 768 |
intfloat/e5-small-v2 | 0.13 | 384 |
1. Do I need to add the prefix "query: " and "passage: " to input texts?
Yes, this is how the model is trained, otherwise you will see a performance degradation.
Here are some rules of thumb:
- Use "query: " and "passage: " correspondingly for asymmetric tasks such as passage retrieval in open QA, ad-hoc information retrieval.