-
Notifications
You must be signed in to change notification settings - Fork 0
Week 11 and 12
During weeks 11 and 12, I implemented the following evaluation metrics:
- MAP@k (Mean Average Precision)
- nDCG@k (Normalized Discounted Cumulative Gain)
- Create a
Metric
subclass - For your subclass, override the following methods:
-
name()
: returns the model name as a string. -
eval(ratings, recommendations)
: evaluate recommendation metric over user training/test ratings.
-
- Add your subclass to evaluator/metric2class.py, adding a dictionary key with the metric name (to be used in the config file), its submodule and class name.
In the .yaml file, the evaluation directive defines the experiment evaluation. The evaluation metadata is given in the format metadata1: metadata1_value
. Example:
experiment:
# ...
evaluation:
k: 5
relevance_threshold: 3
metrics: [MAP, nDCG]
Where,
-
evaluation: specifies the evaluation metadata (mandatory)
- k: evaluates the first k recommendations (mandatory)
- relevance_threshold: threshold value to consider a rating relevant.
- metrics: list of metric names to be evaluated. (mandatory)
The goal of this framework is to implement the main used evaluation metrics as demonstrated by DaisyRec literature review.
Some of the main metrics are Precision, Recall, Mean Average Precision (MAP), Hit Ratio (HR), Mean Reciprocal Rank (MRR) and Normalized Discounted Cumulative Gain (nDCG).
For now, the implemented metrics were MAP and nDCG.
The Average Precision
Where precision
Finally, the
The Discounted Cumulative Gain
The normalization is done by the Ideal Discounted Cumulative Gain at k:
The final overall metric for all users is given by:
Zhu Sun et al., "DaisyRec 2.0: Benchmarking Recommendation for Rigorous Evaluation"
Sonya Sawtelle, "Mean Average Precision (MAP) For Recommender Systems"