You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This commit was created on GitHub.com and signed with GitHub’s verified signature.
The key has expired.
Updates
Implement a synthetic data generator class for OPE with action embeddings (obp.dataset.SyntheticBanditDatasetWithActionEmbeds) and an estimator leveraging the action embeddings (obp.ope.MarginalziedInverseProbabilityWeighting) (#155 )
Implement several OPE estimators for the multiple logger setting (#154 )
References
Yuta Saito and Thorsten Joachims. "Off-Policy Evaluation for Large Action Spaces via Embeddings." arXiv2022.
Aman Agarwal, Soumya Basu, Tobias Schnabel, Thorsten Joachims. "Effective Evaluation using Logged Bandit Feedback from Multiple Loggers.", KDD2018.
Nathan Kallus, Yuta Saito, and Masatoshi Uehara. "Optimal Off-Policy Evaluation from Multiple Logging Policies.", ICML2021.