- About | Timeline | Schedule | Learning outcomes
- First-run results
- Classes
- Class #1: What is materials informatics + Python crash course
- Class #2: Python libraries for atomistic modelling of materials
- Class #3: Data in materials science
- Class #4: Data exploration, visualization, and fitting
- Class #5: Classical ML for materials science pt.1
- Class #6: Classical ML for materials science pt.2
- Class #7: Graph neural networks for materials science pt.1
- Class #8: Machine learning for molecular simulation
- Class #9: Working on final projects in class
- Class #10: Critical reviews of scientific papers
- Class #11: Graph neural networks for materials science pt.2
- Class #12: Final project presentations
- Assessment criteria
- Final project description
- Data used
- List of resources related to materials informatics
- References
The course is an overview of data-driven techniques for accelerating materials design with a focus on the atomistic scale and inorganic compounds. In general, each lecture is a short overview + minimum required theory. The seminars are the main part of the course. During the course we will: learn Python libraries for atomistic materials modelling, get an overview of materials science databases and learn how to use the Materials project API, apply machine learning algorithms to predict materials properties, and perform molecular dynamics simulations with deep learning interatomic potentials.
It is expected that students will better understand the concepts through learning by doing. At the end of the course, students will present a final project in the form of an article based on the homework they have completed.
The course is developed by Artem Dembitskiy (4th-year Ph.D.) under the supervision of prof. Dmitry Aksenov at the Skolkovo Institute of Science and Technology
Term 1B, Sept. 30 - Oct. 25, MON THU FRI 16:00-19:00
Click to open
- Week #1 (easy/medium)
- What is materials informatics?
- Python for atomistic modeling of materials
- Data in materials science
- Week #2 (medium)
- Exploratory data analysis
- Classical ML for materials science pt.1
- Classical ML for materials science pt.2
- Week #3 (hard)
- Graph neural networks for materials science pt.1
- Graph neural networks for materials science pt.2
- Machine learning for molecular simulation
- Week #4 (easy)
- Working on final projects in class
- Critical reviews of scientific journal articles
- Final projects presentation
On completion of the course you will be able to:
- Apply python libraries and data science tools to solve materials science problems
- Critically evaluate materials informatics literature
- Collect, generate and analyse materials science datasets, including identification of structure-property relationships
- Computational materials science track
- Basic knowledge of materials modeling, python (numpy, pandas), crystal chemistry, linear algebra
- Laptop
This github repo contains most of the course content. Quizzes and homeworks will be announced separately in the canvas and the telegram chat.
Course Evaluation Survey
In the figure below, you can see how students responded to the questions we asked them regarding the first-run of the course.
-
Question #1. Was it convenient for you to use Github for the course navigation?
-
Question #2. Please rank on a 7-point scale (7 being the highest) the degree to which you think you achieved the learning outcome "Apply python libraries and data science tools to solve materials science problems"
-
Question #3. Please rank on a 7-point scale (7 being the highest) the degree to which you think you achieved the learning outcome "Critically evaluate materials informatics literature"
-
Question #4. Please rank on a 7-point scale (7 being the highest) the degree to which you think you achieved the learning outcome "Collect, generate and analyse materials science datasets, including identification of structure-property relationships"
The questionnaire was adapted from Using Jupyter Tools to Design an Interactive Textbook to Guide Undergraduate Research in Materials Informatics
Each class consists of a relatively short lecture and a relatively long (coding) seminar. All class materials are stored in the lectures and seminars folders. Each class begins with "Previously on...", "Class Goals" and "Agenda" and ends with a "Take Home Message".
Class | Lecture | Seminar | Homework | Supplementary materials |
---|---|---|---|---|
1. (Date: Sep. 30) |
Lecture 1 Agenda: Materials informatics overview. Motivation, navigation. ILOs and assessment. HWs and FP description. |
Seminar 1 Agenda: Google Colab, reminder of the key libraries used in science: numpy, pandas, scipy, matplotlib. |
HW1 Agenda: Python basics, numpy, pandas, scipy, matplotlib. Python for atomistic modeling. The Materials project API Deadline: Oct., 10, 2024, 15:59 MSK |
|
2 (Date: Oct. 3) |
Lecture 2 Agenda: Python in materials science. |
Seminar 2 Agenda: The ase and pymatgen python libraries. Molecules and crystals. Various text formats of a material representation. Local coordination, nearest neighbors list building, Voronoi partitioning, translational symmetry. |
ASE: tips and tricks, Pymatgen tutorials | |
3 (Date: Oct. 4) |
Lecture 3 Agenda: Data in materials science. FAIR principles. The Materials Project and its API. |
Seminar 3 Agenda: Screening of solid-state electrolytes using The Materials Project's API and pymatgen. |
Paper: FAIR, Paper: MP, The MP API: Getting started | |
4 (Date: Oct. 7) |
Lecture 4 Agenda: Exploratory data analysis. |
Seminar 4 Agenda: scipy, matplotlib, pandas, EDA |
Lecture from CS 109a course by Pavlos Protopapas & Kevin Rader | |
5 (Date: Oct. 10) |
Lecture 5 Agenda: ML for materials science. Types of tasks. Property and descriptor. |
Seminar 5 Agenda: scikit-learn python library, regression models for predicting mechanical and thermodynamic properties of materials. HW1 review. |
HW2 Agenda: sklearn, regression, hardness prediction, feature importances and feature selection, molecular dynamics simulation using universal interatomic potentials. Deadline: Oct., 21, 2024, 15:59 MSK FP announcement. Deadline: Oct., 25, 2024, 23:59 MSK |
Paper |
6 (Date: Oct. 11) |
Lecture 6 Agenda: Feature design in materials science. Geometrical and compositional features. Hierarchy of the crystal structure descriptors. Crystal structure fingerprint. Feature importance |
Seminar 6 Agenda: matminer and dscribe python libraries. Reproduce an article on feature design. |
Paper | |
7 (Date: Oct. 14) |
Lecture 7 Agenda: Artificial neural networks. Loss function. Backpropagation. Graph representation of materials. How to deal with periodicity. Crystal Graph Convolutional Neural Networks (CGCNN). Message passing. |
Seminar 7 Agenda: github, CGCNN for predicting formation energy of crystals. |
HW3 Agenda: Paper review Deadline: Oct., 21, 2024, 15:59 MSK |
Paper |
8 (Date: Oct. 17) |
Lecture 8 Agenda: Machine learning for molecular simulation. Interatomic potential. Energy and forces. Molecular dynamics employing GNNs. Active learning. Foundation models. |
Seminar 8 Agenda: M3GNet model for molecular dynamics simulation of Li-ion diffusion in Li3PS4. HW2 review. |
Paper | |
9 (Date: Oct. 18) |
Lecture 9 Agenda: The course wrap up. Tips to complete a final project. Formulation of the problem. Data collection/analysis. Data splitting. Feature design. Model selection. Results analysis. Common mistakes, good and bad practices in employing ML for materials science. |
Seminar 9 Agenda: Working on final projects |
||
10 (Date: Oct. 21) |
Lecture 10 Agenda: Students present their critical reviews of materials informatics articles (oral presentations) |
Seminar 10 Agenda: Continuation of the lecture |
||
11 (Date: Oct. 24) |
Lecture 11 Agenda: Invariance and Equivariance. E(3)-equivariant graph neural networks |
Seminar 11 Agenda: torch, training loop, NequIP - E(3)-equivariant graph neural network |
Paper | |
12 (Date: Oct. 25) |
Lecture 12 Agenda: Final projects presentations |
Seminar 12 Agenda: Final projects presentation |
- Attendance 0%
- Quizzes 10%
- HW 45%
- Final project 35%
-
- Written report 50%
-
- Oral presentation 40-50%
-
- Discussion of other projects 0-10%
- Peer reviews 10 %
Example
The task is to carry out a 'small' high throughput screening of solid state electrolytes conducting a given ion (Li+, Na+, K+ etc) using data driven techniques and tools covered (or beyond) during the course.
- Given a set of chemical elements
- Formulate selection criteria for high-throughput screening of solid-state electrolytes for all-solid-state Li-ion batteries.
- Download the data from the Materials Project database according to the formulated criteria.
- Calculate the band gap of the selected materials (assuming that you do not have this data deposited at the Materials Project) using at least one classical ML and GNN model and evaluate their performance. For ML model calculate crystal structure descriptors using your own featurizer or open-source tools. Perform feature importance study.
- Select one of the most promising materials and perform a diffusion simulation using your favourite universal interatomic potential.
- Calculate the activation barrier of the mobile ion and its diffusion coefficient
- Compare your materials with existing alternatives
- Write a 3-5 page article style report including
- Introduction
- Methods
- Results
- Discussion
- Conclusion
- Bibliography
- Prepare a 7 minutes oral presentation
- Books
- Materials Informatics and Catalysts Informatics: An Introduction, Keisuke Takahashi, Lauren Takahashi, 2024, ISBN-10: 981970216X
- Deep Learning, Ian Goodfellow and Yoshua Bengio and Aaron Courville, 2016, MIT Press, https://www.deeplearningbook.org/
- Papers
- Recent advances and applications of machine learning in solid-state materials science., Schmidt, J., Marques, M.R.G., Botti, S. et al., npj Comput Mater 5, 83 (2019). https://doi.org/10.1038/s41524-019-0221-0
Data used for seminars and homeworks
Name | Description | Source |
---|---|---|
Li-ion conductivity dataset | The dataset of experimentally measured Li-ion conductivities in crystal (and amorphous) ceramics. The data includes crystal structure family, chemical family, chemical composition, target property, temperature of measurements, and source of the data. The data is poisoned with None values and outliers. The task for the students is to clean the dataset and perform exploratory data analsysis. | Hargreaves, C.J., Gaultois, M.W., Daniels, L.M. et al. A database of experimentally measured lithium solid electrolyte conductivities evaluated with machine learning. npj Comput Mater 9, 9 (2023). https://doi.org/10.1038/s41524-022-00951-z |
The Materials project band gap dataset | The dataset of a band gap values calculated using density functional theory for crystal structures. The task for students is to perform the exploratory data analysis, find the correlation between band gap value and average electronegativity of the structure | The Materials project API was used to retrieve the data. |
Double perovskite oxides band gap dataset | The dataset consists of the band gap targets calculates with density functional theory and the elemental and geometrical descriptors of the crystal structures. The task for the students is to perform exploratory data analysis, find the correlations between the target and descriptors, optimize hyperparametrs of the regression models conduct the feature selection and feature importance study. | Talapatra, A., Uberuaga, B.P., Stanek, C.R. et al. Band gap predictions of double perovskite oxides using machine learning. Commun Mater 4, 46 (2023). https://doi.org/10.1038/s43246-023-00373-4 |
Hardness dataset | The dataset of expeimentally measured hardness of materials. The data is used for HW2 on supervised machine learning | Tantardini, Christian, et al. "Material hardness descriptor derived by symbolic regression." Journal of Computational Science 82 (2024): 10240, repo |
-
The Materials project database
The most popular database of crystal structures and their properties calculated with density functional theory (DFT) -
AFLOW
A database of material compounds and DFT calculated properties -
OQMD
A database of DFT calculated thermodynamic and structural properties of materials
-
A polymer dataset
Structures, atomization energies, band gaps, and dielectric constants of 1k polymers. -
SISSO hardness
A dataset of experimentally measured hardness of 61 material. -
QM9
DFT calculated properties for 134k stable small organic molecules made up of CHONF. -
Li-ion conductivities
An experimentally measured Li-ion conductivity dataset of 2k solids. -
Double perovskite oxides band gap dataset
A dataset of 5k band gap energies calculated with DFT for double perovskites.
-
Awesome Materials Informatics
A list of known efforts in materials informatics. -
Geometric GNNs
A list of geometric graph neural networks for atomistic modeling. -
Best of Atomistic Machine Learning
A list with 430 open-source projects grouped into 22 categories. -
Neural Network Models for Chemistry
A collection of Neural Network Models for chemistry.
-
ASE
A python library for setting up, steering, and analyzing atomistic simulations. -
Pymatgen
A python library for atomic structures analysis -
matminer
A python library for data mining the properties of materials -
DScribe
A python package for transforming atomic structures into fixed-size numerical fingerprints -
TorchSISSO
A PyTorch-Based Implementation of the Sure Independence Screening and Sparsifying Operator (SISSO) for Efficient and Interpretable Model Discovery
-
Pymatgen tutorials
Various tutorials on how to use pymatgen, the python library for atomistic materials modeling and post-processing of the density functional theory calculations. -
Matminer examples
Tutorials on how to use matminer, the python library for encoding atomic structures (i.e. generating atomic structure descriptors).
-
SevenNet
A graph neural network interatomic potential package supporting efficient multi-GPU parallel molecular dynamics simulations. -
MACE_MP
Pre-trained foundation models for materials chemistry, parameterised for 89 chemical elements. -
CHGNet
A pretrained universal neural network potential for charge-informed atomistic modeling. -
M3GNet
A universal graph deep learning interatomic potential for the periodic table. Note: this potential is trained on a smaller dataset.
We would like to thank:
- Andrey Geondzhian for giving a talk on neural networks for materials science (Oct. 2024)
- Innokentiy Humonen for giving a talk on equivariant graph neural networks for materials science (Oct. 2024)
- Machine learning by Evgeny Burnaev
- Introduction to materials informatics by Mark Asta and Enze Chen
- Materials informatics by Taylor Sparks
- Single-lecture introduction to materials informatics by Edward Kim
If you have any ideas/comments on how to improve the content of the course, or have found any typos and mistakes, don't hesitate to create a github issue.