Question on some configs for reproducing #5

cadurosar · 2023-04-01T08:59:01Z

Hi,

amazing work! I was trying to replicate the 4a and 4b experiments, but it seems that they are duplicated, could you help me with this?

Query expansion | 4a | Before: lsr/configs/experiment/unicoil_tilde_msmarco_distil.yaml After: lsr/configs/experiment/splade_asm_dmlp_msmarco_distil.yaml
Query expansion | 4b | Before: lsr/configs/experiment/unicoil_tilde_msmarco_distil.yaml After: lsr/configs/experiment/splade_asm_dmlp_msmarco_distil.yaml

Thanks!

The text was updated successfully, but these errors were encountered:

thongnt99 · 2023-04-03T02:50:36Z

Hi @cadurosar I updated the configurations. Thanks for pointing out the duplication.

I also want to note that we performed an additional step of length matching with the full Splade model (splade_msmarco_distil_flops_0.1_0.08.yaml) to remove the discrepancy due to length difference.

Let assume that this full Splade model generates NQ(q) query terms for a query q and ND(d) terms for a document d.

After training 4a (Before - splade_asm_qmlp_msmarco_distil_flops_0.0_0.08.yaml), we pruned the documents to match (or no longer than) the above ND(d).

Similarly, after training 4b (After - lsr/configs/experiment/splade_asm_dmlp_msmarco_distil.yaml), we pruned the output queries to match the above NQ(q).

The length difference is due to the fact that dropping FLOPs regularizer on one side (e.g., query) doesn't result in the same sparsity on the other side (e.g., document). For example, the documents produced by [Splade, FLOPs(0.1, 0.08)] is unlikely to be equally sparse as documents produced by [Splade, FLOPs(0.0, 0.08)]. Same thing may happen with queries in [Splade, FLOPs(0.1, 0.08)] and [Splade, FLOPs(0.1, 0.0)].

I hope this helps and happy to have further discussion.

cadurosar · 2023-04-03T11:16:23Z

Thanks a lot for fixing the configs, I will take a look into this as soon as possible but I believe that now should be sufficient to reproduce :)

And yes, the problem of the sparsity depending not only on the FLOPS you set for the given modality (query/doc), but on the relations between them is something we saw as well. However, I am not sure to have understood exactly what you did do you do it case by case (i.e. each individual document may not be larger than it was on the previous method), or via the average (i.e. each individual document may not be larger than the previous average)? I am more of a believer in the first, but I wanted to be sure. Also, would you have code for reproducing that part as well?

thongnt99 · 2023-04-17T14:40:20Z

hi @cadurosar,

Do you have any further update on the reproduction?

cadurosar · 2023-04-26T08:00:54Z

Sorry @thongnt99 the post ECIR has been crazier than I expected. I have been able to reproduce the results and I'm quite surprised by the results I got on BEIR as they differ vastly from what we had when removing query expansion. For us removing query expansion on BEIR reduced the results, but it does not seem to be the case when using the MLP strategy.

Results are slightly worse compared to SPLADE++ but is expected due to the differences with MLM (distilbert vs cocondenser), still need to test them on the same field, but compared to COIL CR for example the results are good.

Retriever Type:	PP	CoilCR	4a Before	4a After
Average "13"	50.7	47.3	48.5	48.7
arguana	51.8	34.2	49.8	53.3
climate-fever	23.7	18.6	18.9	17.2
dbpedia-entity	43.6	37.8	43.3	43.3
fever	79.6	78.2	75.1	75.6
fiqa	34.9	31.0	33.2	32.7
hotpotqa	69.3	68.3	68.6	68.5
nfcorpus	34.5	33.8	33.7	33.8
nq	53.3	48.3	52.4	52.1
quora	84.9	77.3	76.4	77.3
scidocs	16.1	15.3	15.1	15.3
scifact	71.0	69.8	66.2	67.7
trec-covid	72.5	73.5	68.4	69.8
webis-touche2020	24.2	28.7	29.3	26.1

cadurosar · 2023-04-26T08:01:12Z

If you want I can send a PR with code for running BEIR from ir_datasets

thongnt99 · 2023-04-26T11:07:06Z

Great. Thanks a lot for the update.
That would be awesome if you can send the pull request for evaluation on BEIR.
I can run evaluation on other checkpoints when I have free machines.

seanmacavaney · 2023-04-27T09:39:44Z

Hey @cadurosar -- I'm curious about the "Reranker Type" label in the table above. Are you using these all as re-rankers? If so, is that due to pooling bias or something else?

Thanks!

cadurosar · 2023-04-27T10:28:42Z

No its just first stage retrieval, this was because I used the same excel I was using for reranking but forgot to change that part... Sorry for that

seanmacavaney · 2023-04-27T11:26:18Z

No worries, thanks for the clarification!

thongnt99 added the question Further information is requested label Apr 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question on some configs for reproducing #5

Question on some configs for reproducing #5

cadurosar commented Apr 1, 2023

thongnt99 commented Apr 3, 2023 •

edited

Loading

cadurosar commented Apr 3, 2023

thongnt99 commented Apr 17, 2023

cadurosar commented Apr 26, 2023 •

edited

Loading

cadurosar commented Apr 26, 2023

thongnt99 commented Apr 26, 2023

seanmacavaney commented Apr 27, 2023

cadurosar commented Apr 27, 2023

seanmacavaney commented Apr 27, 2023

Question on some configs for reproducing #5

Question on some configs for reproducing #5

Comments

cadurosar commented Apr 1, 2023

thongnt99 commented Apr 3, 2023 • edited Loading

cadurosar commented Apr 3, 2023

thongnt99 commented Apr 17, 2023

cadurosar commented Apr 26, 2023 • edited Loading

cadurosar commented Apr 26, 2023

thongnt99 commented Apr 26, 2023

seanmacavaney commented Apr 27, 2023

cadurosar commented Apr 27, 2023

seanmacavaney commented Apr 27, 2023

thongnt99 commented Apr 3, 2023 •

edited

Loading

cadurosar commented Apr 26, 2023 •

edited

Loading