OSIRRC Docker Image for Anserini+BM25PRF

This readme is heavily based (i.e. copied from) the Anserini readme.

This is the docker image for implementing BM25 + Pseudo Relevance Feedback (PRF) [1] with Anserini [2]. The image is conforming to the OSIRRC jig for the Open-Source IR Replicability Challenge (OSIRRC) at SIGIR 2019.

This image is available on Docker Hub.

This image implemented Bm25+Pseudo Relevance Feedback(PRF) with Anserini.

Supported test collections: robust04
Supported hooks: init, index, search and train

Quick Start

The following jig command can be used to index TREC disks 4/5 for robust04:

python run.py prepare \
  --repo osirrc2019/anserini-bm25prf \
  --tag latest \
  --collections robust04=/path/to/disk45=trectext

The following jig command can be used to perform a retrieval run on the collection with the robust04 test collection with default hyper-parameters.

python run.py search \
  --repo osirrc2019/anserini-bm25prf \
  --output out \
  --qrels qrels/qrels.robust04.txt \
  --topic topics/topics.robust04.txt \
  --collection robust04 \ 
  --top_k 1000

The following jig command can be used to tune the hyper-parameters. Note that the grid search may take several hours.

python run.py train \
   --repo osirrc2019/anserini-bm25prf \
   --tag latest \
   --topic topics/topics.robust04.txt \
   --qrels $(pwd)/qrels/qrels.robust04.txt \
   --validation_split $(pwd)/sample_training_validation_query_ids/robust04_validation.txt \
   --test_split $(pwd)/sample_training_validation_query_ids/robust04_test.txt \
   --model_folder $(pwd)/trained \
   --collection robust04

Expected Results on TREC 2004 Robust

The following numbers should be able to be re-produced using the scripts provided by the jig.

BM25+PRF with Default Hyper-paramteres

Hyper-paramteres: k1=0.9 b=0.4 k1_prf=0.9 b_prf=0.4 num_new_terms=20 num_docs=10 new_term_weight=0.2

Command:

python run.py search \
  --repo osirrc2019/anserini-bm25prf   \
  --output out \
  --qrels qrels/qrels.robust04.txt \
  --topic topics/topics.robust04.txt \
  --collection robust04

Metric	Score
MAP	0.2928
P@30	0.3438

Tuning BM25+PRF

Command:

python run.py train \
   --repo osirrc2019/anserini-bm25prf \
   --tag latest \
   --topic topics/topics.robust04.txt \
   --qrels $(pwd)/qrels/qrels.robust04.txt \
   --validation_split $(pwd)/sample_training_validation_query_ids/robust04_validation.txt \
   --test_split $(pwd)/sample_training_validation_query_ids/robust04_test.txt \
   --model_folder $(pwd)/trained \
   --collection robust04

Tuned Hyper-paramteres:

Paramteres	k1	b	k1_prf	b_prf	num_new_terms	num_docs	new_term_weight
Value	0.9	0.2	0.9	0.6	40	10	0.1

BM25+PRF with Tuned Hyper-paramteres

Hyper-paramteres: k1=0.9 b=0.2 k1_prf=0.9 b_prf=0.6 num_new_terms=40 num_docs=10 new_term_weight=0.1

Command:

 python run.py search \
  --repo osirrc2019/anserini-bm25prf \
  --output out \
  --qrels qrels/qrels.robust04.txt \
  --topic topics/topics.robust04.txt \
  --collection robust04 \
  --opts k1=0.9 b=0.2 k1_prf=0.9 b_prf=0.6 num_new_terms=40 num_docs=10 new_term_weight=0.1

Metric	Score
MAP	0.2916
P@30	0.3396

Yes, the tuned hyper-parameters make the performance worse.......

Reference

[1] Stephen E. Robertson, and Karen Spärck Jones. Simple, proven approaches to text retrieval. University of Cambridge Computer Laboratory, 1994.

[2] Peilin Yang, Hui Fang, and Jimmy Lin. Anserini: Enabling the Use of Lucene for Information Retrieval Research. SIGIR 2017

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
Anserini @ 578c2cf		Anserini @ 578c2cf
.gitignore		.gitignore
.gitmodules		.gitmodules
.travis.yml		.travis.yml
Dockerfile		Dockerfile
README.md		README.md
index		index
init		init
interact		interact
requirements.txt		requirements.txt
runner.py		runner.py
search		search
splittools.py		splittools.py
train		train
tune.py		tune.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OSIRRC Docker Image for Anserini+BM25PRF

Quick Start

Expected Results on TREC 2004 Robust

BM25+PRF with Default Hyper-paramteres

Tuning BM25+PRF

BM25+PRF with Tuned Hyper-paramteres

Reference

About

Releases 4

Packages

Contributors 2

Languages

osirrc/anserini-bm25prf-docker

Folders and files

Latest commit

History

Repository files navigation

OSIRRC Docker Image for Anserini+BM25PRF

Quick Start

Expected Results on TREC 2004 Robust

BM25+PRF with Default Hyper-paramteres

Tuning BM25+PRF

BM25+PRF with Tuned Hyper-paramteres

Reference

About

Resources

Stars

Watchers

Forks

Releases 4

Packages 0

Contributors 2

Languages

Packages