AdaCQR: Enhancing Query Reformulation for Conversational Search via Sparse and Dense Retrieval Alignment(COLING 2025)

💻 Codes

Passages Indexing

See README.md in preprocess folder.

Training

NOTE: The processed training and testing datasets, including QReCC and TopiOCQA, can be downloaded from Google Drive.

Before training and inference, please modify the provided scripts to make sure the variables(including index path and data path) are correctly set.

Stage 1 Training:

bash scripts/train_stage1.sh

After obtaining Stage 1 models, modify the checkpoint path and output path in scripts/gen_candidates.sh, then generating candidates for Stage 2:

bash scripts/gen_candidates.sh

Leveraging the generated candidates, we use a fusion metric to obtain the relative orders for Stage 2 training:

bash scripts/obtain_ranking_ance.sh
bash scripts/obtain_ranking_bm25.sh
bash scripts/ranking_fusion.sh

Stage 2 Training:

bash scripts/train_rerank.sh

We manually stop Stage 2 training after one epoch and leverage the new checkpoints to generate new candidates for training, which is needed 2-3 times based on our experiments.

Inference

For inference of AdaCQR, the script is being used to generate reformulation queries:

bash scripts/test_rerank.sh

For BM25 and ANCE retrieval, we provide src/cs_shortcut/run_dense_search.sh and src/test_BM25_direct.py, feel free to use them.
For Query Expansion, you can use tools like vLLM or Ollama to leverage LLM to generate pseudo expansion.

📖 Citation

@misc{lai2024adacqrenhancingqueryreformulation,
      title={AdaCQR: Enhancing Query Reformulation for Conversational Search via Sparse and Dense Retrieval Alignment}, 
      author={Yilong Lai and Jialong Wu and Congzhi Zhang and Haowen Sun and Deyu Zhou},
      year={2024},
      eprint={2407.01965},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2407.01965}, 
}

Acknowledgment

We are very grateful to leverage prior works & source code to build this work, which includes ConvGQR, InfoCQR, cs-shortcut, LLM4CS, BRIO.

If you have any questions about AdaCQR, feel free to contact me by [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
preprocess		preprocess
scripts		scripts
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AdaCQR: Enhancing Query Reformulation for Conversational Search via Sparse and Dense Retrieval Alignment(COLING 2025)

💻 Codes

Passages Indexing

Training

Inference

📖 Citation

Acknowledgment

About

Contributors 2

Languages

License

init0xyz/AdaCQR

Folders and files

Latest commit

History

Repository files navigation

AdaCQR: Enhancing Query Reformulation for Conversational Search via Sparse and Dense Retrieval Alignment(COLING 2025)

💻 Codes

Passages Indexing

Training

Inference

📖 Citation

Acknowledgment

About

Resources

License

Stars

Watchers

Forks

Contributors 2

Languages