0.4.1
The changes are summarized below:
-
Add some functions to implement OPE for slate contextual bandit setting [1]
-
Make
OffPolicyEvaluation
class more useful -
Fix some bugs and Refactoring
- Epsilon-greedy algorithm (#107)
- Type checks in OPE estimators (#106)
- Linear and logistic policies (#91)
-
Welcome new contributors (#94)
references
- [1] James McInerney, Brian Brost, Praveen Chandar, Rishabh Mehrotra, and Benjamin Carterette. 2020. Counterfactual Evaluation of Slate Recommendations with Sequential Reward Interactions. In Proceedings of the 26th ACM SIGKDD
International Conference on Knowledge Discovery and Data Mining. 1779–1788. - [2] Mehrdad Farajtabar, Yinlam Chow, and Mohammad Ghavamzadeh. 2018. More robust doubly robust off-policy evaluation. In Proceedings of the 35th International Conference on Machine Learning, PMLR 80, 1447–1456.