Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential for a kaggle competition #195

Open
sgbaird opened this issue Oct 5, 2022 · 2 comments
Open

Potential for a kaggle competition #195

sgbaird opened this issue Oct 5, 2022 · 2 comments

Comments

@sgbaird
Copy link
Contributor

sgbaird commented Oct 5, 2022

I think a major issue with getting more participation on Matbench is that people perform their own splits/work in isolation from Matbench and that performant models tend to be trained on most recent snapshots and comprehensive data (i.e. when the model is going into real-world use). This can make it difficult to persuade people to spend time learning Matbench, even though it is very easy to use, and setting up potentially large compute time for expensive models.

There are two approaches to addressing this.

One is reducing the barrier such as accepting disparate benchmarks, writing up the benchmark notebooks for people upon request, and running the benchmarks for them. The first waters down the benchmark, and the latter two put a lot of burden on the Matbench developers.

A second approach involves increasing the incentive. One way to do this is via a kaggle competition using Matbench 2.0 with property predictions, adaptive design, and generative modeling and offering prizes. This involves upfront work in designing and hosting the competition, but it also distributes the work across the community and incentivizes use of the best models by people, even if they weren't the original authors. Authorship can also be offered for participants with top-scoring models, assuming no disqualification.

We could base it on/learn from the NOMAD 2018 kaggle competition: https://www.nature.com/articles/s41524-019-0239-3.

Prize funding/prizes would need to also be sourced. Maybe materials informatics companies, acceleration consortium, Apple, Meta, etc. would be willing to sponsor.

@sgbaird
Copy link
Contributor Author

sgbaird commented Oct 29, 2022

@sgbaird
Copy link
Contributor Author

sgbaird commented Feb 10, 2023

Something that came to mind for a hackathon with adaptive design tasks is to have two winning categories:

  1. participants that develop a model that uses the fewest number of iterations to reach a predetermined optimum
  2. participants that use the fewest cumulative number of objective function calls during the development of the model

The latter would of course require that access to the underlying objective function has gate-keeping and monitoring.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant