0.5.3

usaito released this 03 Apr 23:08

· 41 commits to master since this release

c25bd6f

Updates

Implement a synthetic data generator class for OPE with action embeddings (obp.dataset.SyntheticBanditDatasetWithActionEmbeds) and an estimator leveraging the action embeddings (obp.ope.MarginalziedInverseProbabilityWeighting) (#155 )
Implement several OPE estimators for the multiple logger setting (#154 )

References

Yuta Saito and Thorsten Joachims. "Off-Policy Evaluation for Large Action Spaces via Embeddings." arXiv2022.
Aman Agarwal, Soumya Basu, Tobias Schnabel, Thorsten Joachims. "Effective Evaluation using Logged Bandit Feedback from Multiple Loggers.", KDD2018.
Nathan Kallus, Yuta Saito, and Masatoshi Uehara. "Optimal Off-Policy Evaluation from Multiple Logging Policies.", ICML2021.

Assets 2