A python script that computes common Information Retrieval's metrics and creates a Precision-Recall curve plot for a given set of results.
Metrics computed by this script:
- Set of Precision values for each query.
- Set of Recall values for each query.
- Average Precision (AvP) for each query.
- Mean Average Precision (MAP) for all queries.
Plots:
- Precision-Recall Curve for a single or multiple query (non-interpolated).
To execute this script from the terminal you should have installed:
python -m pip install -U pip
- scikit-learn package
pip install -U scikit-learn
- matplotlib package
pip install -U matplotlib
From the terminal, execute the script passing two arguments:
- Ordered set of relevant and non-relevant results.
- Total number of relevant documents on the collection for the information need.
python computeMetrics.py <orderedSet> <totalNumberRelevant>
You can also use this script to evaluate more than one query:
python computeMetrics.py <orderedSet1,orderedSet2,...,orderedSet7> <totalNumberRelevant1,totalNumberRelevant2,...,totalNumberRelevant7>
In order to compute the metrics for a single query, run on the terminal:
python computeMetrics.py RNRRRRNNNR 6
Again, the number '6' means that there are in total 6 documents in the collection that are relevant to the evaluated information need.
The script prints as result:
SET: RNRRRRNNNR
PRECISION: [1.0, 0.5, 0.6666666666666666, 0.75, 0.8, 0.8333333333333334, 0.7142857142857143, 0.625, 0.5555555555555556, 0.6]
RECALL: [0.16666666666666666, 0.16666666666666666, 0.3333333333333333, 0.5, 0.6666666666666666, 0.8333333333333334, 0.8333333333333334, 0.8333333333333334, 0.8333333333333334, 1.0]
AVERAGE PRECISION: 0.7749999999999999
MAP: 0.7749999999999999
And creates the Precision-Recall curve plot:
In order to compute the metrics for multiple queries, run on the terminal:
python3 computeMetrics.py RNRNNRNNRR,NRNNRNRNNN 6,6
The script prints as result:
SET: RNRNNRNNRR
PRECISION: [1.0, 0.5, 0.6666666666666666, 0.5, 0.4, 0.5, 0.42857142857142855, 0.375, 0.4444444444444444, 0.5]
RECALL: [0.16666666666666666, 0.16666666666666666, 0.3333333333333333, 0.3333333333333333, 0.3333333333333333, 0.5, 0.5, 0.5, 0.6666666666666666, 0.8333333333333334]
AVERAGE PRECISION: 0.6222222222222221
SET: NRNNRNRNNN
PRECISION: [0.0, 0.5, 0.3333333333333333, 0.25, 0.4, 0.3333333333333333, 0.42857142857142855, 0.375, 0.3333333333333333, 0.3]
RECALL:[0.0, 0.16666666666666666, 0.16666666666666666, 0.16666666666666666, 0.3333333333333333, 0.3333333333333333, 0.5, 0.5, 0.5, 0.5]
AVERAGE PRECISION: 0.44285714285714284
MAP: 0.5325396825396824
And creates the Precision-Recall curve plot:
A standard approach to evaluate a Information retrieval system uses the notion of relevant and non-relevant documents for a information need.
The user performs a search in order to answer a information need. The results retrived by the system can be classified as relevant or non-relevant whether addresses the stated information need or not.
The most frequent metrics used in Information Retrieval evaluation are the concepts of Precision and Recall.
Precision is the fraction of retrieved documents that are relevant.
Precision = #(relevant items retrieved) / #(total retrieved items)
Recall is the fraction of relevant documents that are retrieved.
Recall = #(relevant items retrieved) / #(total relevant items in the collection)
Average Precision (AvP) provides a single-figure measure of quality across recall levels for a single query. It is the average of the precision value obtained after each relavant document is retrieved in the resuls lists.
Mean Average Precision (MAP) is the mean of AvP values for a given set of queries. It is a good measure for the quality of a system.
A Precision-Recall curve plots the tradeoff between precision and recall for different threshold.
A user wants information on whether eating fruits can improve his immune system (information need). The user searches for [fruits improve immune system] (query).
Let's assume that the system returned 5 results.
From the first result to the last one they were classified as - Relevant, Relevant, Non-Relevant, Relevant, Non-Relevant - to the user's information need.
This can be translated into the set RRNRN.
Let's also assume that in the indexed collection there are in total 8 results relevant to this information need.
Precision = (3 / 5) = 0.6
Recall = (3 / 8) = 0.375
We can compute the precision and recall value at each position of the 5 results.
Results | R | R | N | R | N |
---|---|---|---|---|---|
Recall | 1/8 | 2/8 | 2/8 | 3/8 | 3/8 |
Precision | 1/1 | 2/2 | 2/3 | 3/4 | 3/5 |
The Precision-Recall curve is built with those sets of recall and precision values.
We can now compute the average precision (AvP) using the values of precision for relevant documents.
AvP = (1/1 + 2/2 + 3/4) / 3 = 0.91
Let's assume that the user did a second search with a different information need for which the system displayed an average precision of 0.82.
We can compute the system's Mean Average Precision (MAP) for both information needs as:
MAP = (0.91 + 0.82) / 2 = 0.865
Manning, C., Raghavan, P., Schutze, H. (2009). An Introduction to Informational Retrieval. Cambridge University Press.