This repository contains codes related to the publication "Leveraging composition-based energy material descriptors for machine learning models" (https://www.sciencedirect.com/science/article/pii/S2352492823012709).
In particular:
- Folder
Classification
contains code for training/validation/testing of all classifiers used in this work (ETCs, QEGs, Naive Bayesian), for assessing their performances (Classifiers_performances.ipynb
), together with the file containing the predicted probabilities for materials in the testing set to be in class 1 (classifiers_comparison.xlsx
). - Folder
Mixed features optimization
contains all.m
files for finding the optimized mixed features with multi-objective optimization as linear or product combination of the original features extracted by means of Matminer. Specifically, the fileMAIN.m
has to be run, deciding how many features to mix and how many mixed features to have in output (1 or 2); on the contrary,MainSingle.m
has to be run for finding mixed features with single-objective optimization. FolderPareto fronts
contains already computed Pareto fronts for the examples shown in this work. - Folder
Regression & invariance
contains the fileETR&SHAP.ipynb
for training/validating/testing the ETR model, with the SHAP analysis to rank the input features, together with such rankingSHAP_for_ETR_metallic_mean.xlsx
and with the code for the search of invariant groupsDNN&invariant_groups.ipynb
. - File
Coefficients_mixed_variables.xlsx
contains the coefficients for mixing the first 30 or 52 original features extracted by means of Matminer in the ranking obtained with SHAP. - File
Database_construction.ipynb
contains the code for cleaning the original SuperCon database. - File
predictions_on_MPj_materials.xlsx
contains the probability predictions of the best two classifiers of this work (ETC-vanilla, ETC-SMOTE), the best QEG-based classifier (QEG 2D-mixed lin) and of the GEV classifier (for$T_{\rm{c}}\geq 35~\textup{K}$ ) over the$\sim$ 40,000 materials in MaterialsProject and not in SuperCon; furthermore, the class prediction 1/0 is also provided, considering the probability threshold which maximizes the$F_{1, \textup{max}}$ score.