-
Notifications
You must be signed in to change notification settings - Fork 32
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #18 from hachmannlab/wrapper_aatish
Updated documentation, tutorials, regression metrics
- Loading branch information
Showing
31 changed files
with
110,165 additions
and
272 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -22,10 +22,7 @@ Please check the [ChemML website](https://hachmannlab.github.io/chemml) for more | |
ChemML is developed in the Python 3 programming language and makes use of a host of data analysis and ML libraries(accessible through the Anaconda distribution), as well as domain-specific libraries. | ||
The development follows a strictly modular and object-oriented design to make the overall code as flexible and versatile as possible. | ||
|
||
The format of library is similar to the well known libraries like Scikit-learn. ChemML will be soon available | ||
via graphical user interface provided by [ChemEco](https://github.com/hachmannlab/chemeco). | ||
ChemEco is a general-purpose framework for data mining without coding. It also interfaces with many of the libraries that supply methods for the | ||
representation, preprocessing, analysis, mining, and modeling of large-scale chemical data sets. | ||
The format of library is similar to the well known libraries like Scikit-learn. | ||
|
||
|
||
## Latest Version: | ||
|
@@ -44,12 +41,14 @@ Here is a list of external libraries that will be installed with chemml: | |
- matplotlib | ||
- seaborn | ||
- lxml | ||
- openpyxl | ||
- ipywidgets | ||
|
||
Since conda installation is not available for ChemML yet, we recommend installing rdkit and openbabel (please install openbabel 2.x not openbabel 3.x) in a conda virtual environment prior to installing ChemML. For doing so, you need to follow the conda installer: | ||
We strongly recommend you to install ChemML in an Anaconda environment. The instructions to create the environment, install ChemML’s dependencies, and subsequently install Chemml using the Python Package Index (PyPI) via pip are as follows: | ||
|
||
conda create --name my_chemml_env python=3.6 | ||
source activate my_chemml_env | ||
conda install -c conda-forge rdkit openbabel | ||
conda create --name chemml_env python=3.8 | ||
source activate chemml_env | ||
conda install -c conda-forge openbabel rdkit nb_conda_kernels python-graphviz | ||
pip install chemml | ||
|
||
## Citation: | ||
|
@@ -93,6 +92,13 @@ Please cite the use of ChemML as: | |
year = {2018} | ||
} | ||
|
||
@article{vishwakarma2019towards, | ||
title={Towards autonomous machine learning in chemistry via evolutionary algorithms}, | ||
author={Vishwakarma, Gaurav and Haghighatlari, Mojtaba and Hachmann, Johannes}, | ||
journal={ChemRxiv preprint}, | ||
year={2019} | ||
} | ||
|
||
## License: | ||
ChemML is copyright (C) 2014-2018 Johannes Hachmann and Mojtaba Haghighatlari, all rights reserved. | ||
ChemML is distributed under 3-Clause BSD License (https://opensource.org/licenses/BSD-3-Clause). | ||
|
@@ -102,17 +108,20 @@ ChemML is distributed under 3-Clause BSD License (https://opensource.org/license | |
### Maintainers: | ||
- Johannes Hachmann, [email protected] | ||
- Mojtaba Haghighatlari | ||
- Aditya Sonpal | ||
- Aditya Sonpal, [email protected] | ||
- Aatish Pradhan, [email protected] | ||
University at Buffalo - The State University of New York (UB) | ||
|
||
### Contributors: | ||
- Doaa Altarawy (MolSSI): scientific advice and software mentor | ||
- Gaurav Vishwakarma (UB): automated model optimization | ||
- Ramachandran Subramanian (UB): Magpie descriptor library port | ||
- Bhargava Urala Kota (UB): library database | ||
- Aditya Sonpal (UB): graph convolution NNs | ||
- Srirangaraj Setlur (UB): scientific advice | ||
- Venugopal Govindaraju (UB): scientific advice | ||
- Krishna Rajan (UB): scientific advice | ||
- Aatish Pradhan (UB): Jupyter GUI developer | ||
|
||
- We encourage any contributions and feedback. Feel free to fork and make pull-request to the "development" branch. | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,6 @@ | ||
# __name__ = "chemml" | ||
__version__ = "0.8" | ||
__author__ = ["Mojtaba Haghighatlari ([email protected])", "Johannes Hachmann ([email protected])"] | ||
__version__ = "1.0" | ||
__author__ = ["Aditya Sonpal ([email protected])", "Garuav Vishwakarma ([email protected]) ", "Aatish Pradhan ([email protected])","Mojtaba Haghighatlari ([email protected])", "Johannes Hachmann ([email protected])"] | ||
|
||
|
||
# import sys | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
def error_metric(y_true,y_pred): | ||
y_true = np.asarray(y_true) | ||
y_pred = np.asarray(y_pred) | ||
ndata = len(y_true) | ||
y_mean = np.mean(y_true) | ||
e = y_true - y_pred | ||
ae = np.absolute(e) | ||
se = np.square(e) | ||
var = np.mean(np.square(y_true - y_mean)) | ||
MAE = np.mean(ae) | ||
return MAE |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
def ga_eval(indi): | ||
|
||
layers = [indi[i] for i in range(2,5) if indi[i] != 0] | ||
#print(np.exp(indi[0])) | ||
|
||
#count iterations of GA | ||
count=open("tmp.txt", "a") | ||
count.write("GA search iteration in process... \n") | ||
count.close() | ||
file = open("tmp.txt","r") | ||
Counter = 0 | ||
# Reading number of lines from file | ||
Content = file.read() | ||
CoList = Content.split("\n") | ||
for i in CoList: | ||
if i: | ||
Counter += 1 | ||
print("GA search iteration in process... ",Counter) | ||
mlp = MLPRegressor(alpha=np.exp(indi[0]), activation=indi[1], hidden_layer_sizes=tuple(layers),learning_rate='invscaling', max_iter=10,early_stopping=True) | ||
ga_search = single_obj(mlp=mlp, x=X.values, y=Y.values,n_splits=n_splits) | ||
#print("GA search iteration in process...") | ||
f=open("GA.txt", "a") | ||
f.write("%f %s %d %d %d %f \n" %(float(np.exp(indi[0])), str(indi[1]), int(indi[2]), int(indi[3]), int(indi[4]),float(ga_search))) | ||
f.close() | ||
#gui_return ={"ga_search": ga_search} | ||
#print(gui_return) | ||
return ga_search |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
def single_obj(mlp, x, y, n_splits=n_splits): | ||
n_splits=n_splits | ||
kf = KFold(n_splits) # cross validation based on Kfold (creates 5 validation train-test sets) | ||
accuracy_kfold = [] | ||
for training, testing in kf.split(x): | ||
mlp.fit(x[training], y[training]) | ||
y_pred = mlp.predict(x[testing]) | ||
y_pred, y_act =y_pred.reshape(-1,1), y[testing].reshape(-1,1) | ||
model_accuracy=mae(y_act,y_pred) # evaluation metric: mae | ||
accuracy_kfold.append(model_accuracy) # creates list of accuracies for each fold | ||
#print("def single_obj - completed") | ||
return np.mean(accuracy_kfold) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
space = ({'alpha': {'uniform': [np.log(0.0001), np.log(0.1)], 'mutation': [0, 1]}},{'activation': {'choice': ['identity', 'logistic', 'tanh', 'relu']}},{'neurons1': {'choice': range(0,220,20)}},{'neurons2': {'choice': range(0,220,20)}},{'neurons3': {'choice': range(0,220,20)}}) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
def test_hyp(mlp, x, y, xtest, ytest): | ||
mlp.fit(x, y) | ||
ypred = mlp.predict(xtest) | ||
acc=mae(ytest,ypred) | ||
# print(" test_hyp completed ") | ||
return np.mean(acc) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.