Quantity-aware Retrieval

This repository contains code for the paper "Numbers Matter! Bringing Quantity-awareness to Retrieval Systems". The paper introduces two types of quantity-aware models, joint quantity and term ranking and disjoint ranking.

The modules include data generation, data processing, and evaluation. For fine-tuning the neural models of SPLADE and ColBERT, refer to their respective repositories. For loading the models and creating the quantity-aware variants, some code snippets from the mentioned repositories have been used here. To run the code, create an environment using the requirment.txt file.

Data and Model Checkpoints

Alongside the code, we also publish the training data, benchmark dataset for testing, and trained model checkpoints.

Benchmark data: The benchmark data contains test data, namely FinQuant and MedQuant datasets, and can be downloaded here alongside the annotation guidelines.
Training data: Raw training data containing sentences with quantities from news articles and the Trec Clinical Trails can be downloaded here.
Checkpoints: The trained checkpoint for the fine-tuned models on quantity-centric data for SPLADE and ColBERT on finance and medical data can be downloaded here.

Below we describe the content of each module, for more information and examples refer to the readme files inside each respective module. ### Data Generation For the joint quantity and term ranking we need to generate fine-tuning data using templates and numerical indices. The module `data_generation` contains code for concept expansion, unit, and value permutation.

Dataset

Data loader and dataset classes for loading collections and queries for inference are in the dataset module.

Models

The model architecture and interfaces are in models. The models are divided into semantic and lexical models.

The lexical models include BM25 baselines (BM25 without change, BM25 with filtering) and QBM25(quantity-aware variant).

The semantic models include neural baselines (SPLADE and ColBERT) and quantity-aware variants (QSPLADE and QColBERT).

Evaluate

The module evaluate contains scripts for the evaluation of the proposed models on benchmarks. Here, we include scripts to run evaluation using the pytrec_eval library as well as significant testing.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data_generation		data_generation
dataset		dataset
evaluate		evaluate
models		models
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Quantity-aware Retrieval

Data and Model Checkpoints

Dataset

Models

Evaluate

About

Releases

Packages

Languages

satya77/QuantityAwareRankers

Folders and files

Latest commit

History

Repository files navigation

Quantity-aware Retrieval

Data and Model Checkpoints

Dataset

Models

Evaluate

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages