Skip to content

Latest commit

 

History

History
272 lines (195 loc) · 19.2 KB

README.md

File metadata and controls

272 lines (195 loc) · 19.2 KB

logo

Contents

About

The course is an overview of data-driven techniques for accelerating materials design with a focus on the atomistic scale and inorganic compounds. In general, each lecture is a short overview + minimum required theory. The seminars are the main part of the course. During the course we will: learn Python libraries for atomistic materials modelling, get an overview of materials science databases and learn how to use the Materials project API, apply machine learning algorithms to predict materials properties, and perform molecular dynamics simulations with deep learning interatomic potentials.

It is expected that students will better understand the concepts through learning by doing. At the end of the course, students will present a final project in the form of an article based on the homework they have completed.

The course is developed by Artem Dembitskiy (4th-year Ph.D.) under the supervision of prof. Dmitry Aksenov at the Skolkovo Institute of Science and Technology

Timeline and location

Term 1B, Sept. 30 - Oct. 25, MON THU FRI 16:00-19:00

(Approximate) Schedule

Click to open
  • Week #1 (easy/medium)
    • What is materials informatics?
    • Python for atomistic modeling of materials
    • Data in materials science
  • Week #2 (medium)
    • Exploratory data analysis
    • Classical ML for materials science pt.1
    • Classical ML for materials science pt.2
  • Week #3 (hard)
    • Graph neural networks for materials science pt.1
    • Graph neural networks for materials science pt.2
    • Machine learning for molecular simulation
  • Week #4 (easy)
    • Working on final projects in class
    • Critical reviews of scientific journal articles
    • Final projects presentation

Intended learning outcomes

On completion of the course you will be able to:

  • Apply python libraries and data science tools to solve materials science problems
  • Critically evaluate materials informatics literature
  • Collect, generate and analyse materials science datasets, including identification of structure-property relationships

Course prerequisites

  • Computational materials science track
  • Basic knowledge of materials modeling, python (numpy, pandas), crystal chemistry, linear algebra
  • Laptop

Course navigation

This github repo contains most of the course content. Quizzes and homeworks will be announced separately in the canvas and the telegram chat.

First-run results

Course Evaluation Survey

In the figure below, you can see how students responded to the questions we asked them regarding the first-run of the course.

  • Question #1. Was it convenient for you to use Github for the course navigation?

  • Question #2. Please rank on a 7-point scale (7 being the highest) the degree to which you think you achieved the learning outcome "Apply python libraries and data science tools to solve materials science problems"

  • Question #3. Please rank on a 7-point scale (7 being the highest) the degree to which you think you achieved the learning outcome "Critically evaluate materials informatics literature"

  • Question #4. Please rank on a 7-point scale (7 being the highest) the degree to which you think you achieved the learning outcome "Collect, generate and analyse materials science datasets, including identification of structure-property relationships"

Course evaluation

The questionnaire was adapted from Using Jupyter Tools to Design an Interactive Textbook to Guide Undergraduate Research in Materials Informatics

Classes

Each class consists of a relatively short lecture and a relatively long (coding) seminar. All class materials are stored in the lectures and seminars folders. Each class begins with "Previously on...", "Class Goals" and "Agenda" and ends with a "Take Home Message".

Class Lecture Seminar Homework Supplementary materials
1.
(Date: Sep. 30)
Lecture 1
Agenda: Materials informatics overview. Motivation, navigation. ILOs and assessment. HWs and FP description.
Seminar 1
Agenda: Google Colab, reminder of the key libraries used in science: numpy, pandas, scipy, matplotlib.
HW1
Agenda: Python basics, numpy, pandas, scipy, matplotlib. Python for atomistic modeling. The Materials project API
Deadline: Oct., 10, 2024, 15:59 MSK
2
(Date: Oct. 3)
Lecture 2
Agenda: Python in materials science.
Seminar 2
Agenda: The ase and pymatgen python libraries. Molecules and crystals. Various text formats of a material representation. Local coordination, nearest neighbors list building, Voronoi partitioning, translational symmetry.
ASE: tips and tricks, Pymatgen tutorials
3
(Date: Oct. 4)
Lecture 3
Agenda: Data in materials science. FAIR principles. The Materials Project and its API.
Seminar 3
Agenda: Screening of solid-state electrolytes using The Materials Project's API and pymatgen.
Paper: FAIR, Paper: MP, The MP API: Getting started
4
(Date: Oct. 7)
Lecture 4
Agenda: Exploratory data analysis.
Seminar 4
Agenda: scipy, matplotlib, pandas, EDA
Lecture from CS 109a course by Pavlos Protopapas & Kevin Rader
5
(Date: Oct. 10)
Lecture 5
Agenda: ML for materials science. Types of tasks. Property and descriptor.
Seminar 5
Agenda: scikit-learn python library, regression models for predicting mechanical and thermodynamic properties of materials. HW1 review.
HW2
Agenda: sklearn, regression, hardness prediction, feature importances and feature selection, molecular dynamics simulation using universal interatomic potentials.
Deadline: Oct., 21, 2024, 15:59 MSK
FP announcement.
Deadline: Oct., 25, 2024, 23:59 MSK
Paper
6
(Date: Oct. 11)
Lecture 6
Agenda: Feature design in materials science. Geometrical and compositional features. Hierarchy of the crystal structure descriptors. Crystal structure fingerprint. Feature importance
Seminar 6
Agenda: matminer and dscribe python libraries. Reproduce an article on feature design.
Paper
7
(Date: Oct. 14)
Lecture 7
Agenda: Artificial neural networks. Loss function. Backpropagation. Graph representation of materials. How to deal with periodicity. Crystal Graph Convolutional Neural Networks (CGCNN). Message passing.
Seminar 7
Agenda: github, CGCNN for predicting formation energy of crystals.
HW3
Agenda: Paper review
Deadline: Oct., 21, 2024, 15:59 MSK
Paper
8
(Date: Oct. 17)
Lecture 8
Agenda: Machine learning for molecular simulation. Interatomic potential. Energy and forces. Molecular dynamics employing GNNs. Active learning. Foundation models.
Seminar 8
Agenda: M3GNet model for molecular dynamics simulation of Li-ion diffusion in Li3PS4. HW2 review.
Paper
9
(Date: Oct. 18)
Lecture 9
Agenda: The course wrap up. Tips to complete a final project. Formulation of the problem. Data collection/analysis. Data splitting. Feature design. Model selection. Results analysis. Common mistakes, good and bad practices in employing ML for materials science.
Seminar 9
Agenda: Working on final projects
10
(Date: Oct. 21)
Lecture 10
Agenda: Students present their critical reviews of materials informatics articles (oral presentations)
Seminar 10
Agenda: Continuation of the lecture
11
(Date: Oct. 24)
Lecture 11
Agenda: Invariance and Equivariance. E(3)-equivariant graph neural networks
Seminar 11
Agenda: torch, training loop, NequIP - E(3)-equivariant graph neural network
Paper
12
(Date: Oct. 25)
Lecture 12
Agenda: Final projects presentations
Seminar 12
Agenda: Final projects presentation

Assessment criteria

  • Attendance 0%
  • Quizzes 10%
  • HW 45%
  • Final project 35%
    • Written report 50%
    • Oral presentation 40-50%
    • Discussion of other projects 0-10%
  • Peer reviews 10 %

Final project description

Example

The task is to carry out a 'small' high throughput screening of solid state electrolytes conducting a given ion (Li+, Na+, K+ etc) using data driven techniques and tools covered (or beyond) during the course.

  • Given a set of chemical elements
  • Formulate selection criteria for high-throughput screening of solid-state electrolytes for all-solid-state Li-ion batteries.
  • Download the data from the Materials Project database according to the formulated criteria.
  • Calculate the band gap of the selected materials (assuming that you do not have this data deposited at the Materials Project) using at least one classical ML and GNN model and evaluate their performance. For ML model calculate crystal structure descriptors using your own featurizer or open-source tools. Perform feature importance study.
  • Select one of the most promising materials and perform a diffusion simulation using your favourite universal interatomic potential.
  • Calculate the activation barrier of the mobile ion and its diffusion coefficient
  • Compare your materials with existing alternatives
  • Write a 3-5 page article style report including
    • Introduction
    • Methods
    • Results
    • Discussion
    • Conclusion
    • Bibliography
  • Prepare a 7 minutes oral presentation

Recommended literature

  • Books
    • Materials Informatics and Catalysts Informatics: An Introduction, Keisuke Takahashi, Lauren Takahashi, 2024, ISBN-10: 981970216X
    • Deep Learning, Ian Goodfellow and Yoshua Bengio and Aaron Courville, 2016, MIT Press, https://www.deeplearningbook.org/
  • Papers
    • Recent advances and applications of machine learning in solid-state materials science., Schmidt, J., Marques, M.R.G., Botti, S. et al., npj Comput Mater 5, 83 (2019). https://doi.org/10.1038/s41524-019-0221-0

Data

Data used for seminars and homeworks
Name Description Source
Li-ion conductivity dataset The dataset of experimentally measured Li-ion conductivities in crystal (and amorphous) ceramics. The data includes crystal structure family, chemical family, chemical composition, target property, temperature of measurements, and source of the data. The data is poisoned with None values and outliers. The task for the students is to clean the dataset and perform exploratory data analsysis. Hargreaves, C.J., Gaultois, M.W., Daniels, L.M. et al. A database of experimentally measured lithium solid electrolyte conductivities evaluated with machine learning. npj Comput Mater 9, 9 (2023). https://doi.org/10.1038/s41524-022-00951-z
The Materials project band gap dataset The dataset of a band gap values calculated using density functional theory for crystal structures. The task for students is to perform the exploratory data analysis, find the correlation between band gap value and average electronegativity of the structure The Materials project API was used to retrieve the data.
Double perovskite oxides band gap dataset The dataset consists of the band gap targets calculates with density functional theory and the elemental and geometrical descriptors of the crystal structures. The task for the students is to perform exploratory data analysis, find the correlations between the target and descriptors, optimize hyperparametrs of the regression models conduct the feature selection and feature importance study. Talapatra, A., Uberuaga, B.P., Stanek, C.R. et al. Band gap predictions of double perovskite oxides using machine learning. Commun Mater 4, 46 (2023). https://doi.org/10.1038/s43246-023-00373-4
Hardness dataset The dataset of expeimentally measured hardness of materials. The data is used for HW2 on supervised machine learning Tantardini, Christian, et al. "Material hardness descriptor derived by symbolic regression." Journal of Computational Science 82 (2024): 10240, repo

List of resources related to materials informatics

Databases

  • The Materials project database
    The most popular database of crystal structures and their properties calculated with density functional theory (DFT)

  • AFLOW
    A database of material compounds and DFT calculated properties

  • OQMD
    A database of DFT calculated thermodynamic and structural properties of materials

Datasets

  • A polymer dataset
    Structures, atomization energies, band gaps, and dielectric constants of 1k polymers.

  • SISSO hardness
    A dataset of experimentally measured hardness of 61 material.

  • QM9
    DFT calculated properties for 134k stable small organic molecules made up of CHONF.

  • Li-ion conductivities
    An experimentally measured Li-ion conductivity dataset of 2k solids.

  • Double perovskite oxides band gap dataset
    A dataset of 5k band gap energies calculated with DFT for double perovskites.

Curated lists

Software

  • ASE
    A python library for setting up, steering, and analyzing atomistic simulations.

  • Pymatgen
    A python library for atomic structures analysis

  • matminer
    A python library for data mining the properties of materials

  • DScribe
    A python package for transforming atomic structures into fixed-size numerical fingerprints

  • TorchSISSO
    A PyTorch-Based Implementation of the Sure Independence Screening and Sparsifying Operator (SISSO) for Efficient and Interpretable Model Discovery

Tutorials

  • Pymatgen tutorials
    Various tutorials on how to use pymatgen, the python library for atomistic materials modeling and post-processing of the density functional theory calculations.

  • Matminer examples
    Tutorials on how to use matminer, the python library for encoding atomic structures (i.e. generating atomic structure descriptors).

Universal machine learning interatomic potentials

  • SevenNet
    A graph neural network interatomic potential package supporting efficient multi-GPU parallel molecular dynamics simulations.

  • MACE_MP
    Pre-trained foundation models for materials chemistry, parameterised for 89 chemical elements.

  • CHGNet
    A pretrained universal neural network potential for charge-informed atomistic modeling.

  • M3GNet
    A universal graph deep learning interatomic potential for the periodic table. Note: this potential is trained on a smaller dataset.

Acknowledgement

We would like to thank:

  • Andrey Geondzhian for giving a talk on neural networks for materials science (Oct. 2024)
  • Innokentiy Humonen for giving a talk on equivariant graph neural networks for materials science (Oct. 2024)

References, materials, inspiration

Typos, mistakes, suggestions, comments

If you have any ideas/comments on how to improve the content of the course, or have found any typos and mistakes, don't hesitate to create a github issue.