Release 0.4.1 · st-tech/zr-obp

The changes are summarized below:

Add some functions to implement OPE for slate contextual bandit setting [1]
- SlateSyntheticBanditFeedback (#82, #93, #95, #98, #100, #101, #102, #104, #105)
- Slate OPE Estimators (#88)
Make OffPolicyEvaluation class more useful
- add a method to visualize and compare OPE results of several different policies (#103)
- Enable to use different estimated_rewards_by_reg_model values (this will make MRDR [2] easier to use with obp, #92)
Fix some bugs and Refactoring
- Epsilon-greedy algorithm (#107)
- Type checks in OPE estimators (#106)
- Linear and logistic policies (#91)
Welcome new contributors (#94)

[1] James McInerney, Brian Brost, Praveen Chandar, Rishabh Mehrotra, and Benjamin Carterette. 2020. Counterfactual Evaluation of Slate Recommendations with Sequential Reward Interactions. In Proceedings of the 26th ACM SIGKDD
International Conference on Knowledge Discovery and Data Mining. 1779–1788.
[2] Mehrdad Farajtabar, Yinlam Chow, and Mohammad Ghavamzadeh. 2018. More robust doubly robust off-policy evaluation. In Proceedings of the 35th International Conference on Machine Learning, PMLR 80, 1447–1456.

Provide feedback