-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question on some configs for reproducing #5
Comments
Hi @cadurosar I updated the configurations. Thanks for pointing out the duplication. I also want to note that we performed an additional step of length matching with the full Splade model (splade_msmarco_distil_flops_0.1_0.08.yaml) to remove the discrepancy due to length difference. Let assume that this full Splade model generates NQ(q) query terms for a query q and ND(d) terms for a document d. After training 4a (Before - splade_asm_qmlp_msmarco_distil_flops_0.0_0.08.yaml), we pruned the documents to match (or no longer than) the above ND(d). Similarly, after training 4b (After - lsr/configs/experiment/splade_asm_dmlp_msmarco_distil.yaml), we pruned the output queries to match the above NQ(q). The length difference is due to the fact that dropping FLOPs regularizer on one side (e.g., query) doesn't result in the same sparsity on the other side (e.g., document). For example, the documents produced by [Splade, FLOPs(0.1, 0.08)] is unlikely to be equally sparse as documents produced by [Splade, FLOPs(0.0, 0.08)]. Same thing may happen with queries in [Splade, FLOPs(0.1, 0.08)] and [Splade, FLOPs(0.1, 0.0)]. I hope this helps and happy to have further discussion. |
Thanks a lot for fixing the configs, I will take a look into this as soon as possible but I believe that now should be sufficient to reproduce :) And yes, the problem of the sparsity depending not only on the FLOPS you set for the given modality (query/doc), but on the relations between them is something we saw as well. However, I am not sure to have understood exactly what you did do you do it case by case (i.e. each individual document may not be larger than it was on the previous method), or via the average (i.e. each individual document may not be larger than the previous average)? I am more of a believer in the first, but I wanted to be sure. Also, would you have code for reproducing that part as well? |
hi @cadurosar, Do you have any further update on the reproduction? |
Sorry @thongnt99 the post ECIR has been crazier than I expected. I have been able to reproduce the results and I'm quite surprised by the results I got on BEIR as they differ vastly from what we had when removing query expansion. For us removing query expansion on BEIR reduced the results, but it does not seem to be the case when using the MLP strategy. Results are slightly worse compared to SPLADE++ but is expected due to the differences with MLM (distilbert vs cocondenser), still need to test them on the same field, but compared to COIL CR for example the results are good.
|
If you want I can send a PR with code for running BEIR from ir_datasets |
Great. Thanks a lot for the update. |
Hey @cadurosar -- I'm curious about the "Reranker Type" label in the table above. Are you using these all as re-rankers? If so, is that due to pooling bias or something else? Thanks! |
No its just first stage retrieval, this was because I used the same excel I was using for reranking but forgot to change that part... Sorry for that |
No worries, thanks for the clarification! |
Hi,
amazing work! I was trying to replicate the 4a and 4b experiments, but it seems that they are duplicated, could you help me with this?
Thanks!
The text was updated successfully, but these errors were encountered: