Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing Contrastive Self-Supervised Learning with Radiation Augmentations, SimCLR, PyTorch Lightning, and Hyperparameter Optimization #48

Closed
wants to merge 55 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
55 commits
Select commit Hold shift + click to select a range
e10e632
adding hyperopt functions
Apr 22, 2022
bd0ab96
add supervised logistic regression model function
Apr 22, 2022
1afbcd6
adding cotraining model function
Apr 22, 2022
e3a5e62
adding code for Label Prop model function
Apr 22, 2022
12c46de
adding shadow fully connected NN model function
Apr 22, 2022
3cc5e95
adding shadow eaat cnn function model
Apr 22, 2022
15fede0
abstracting MINOS to Spectra
Apr 22, 2022
a9410da
removing duplicate device in eaat-cnn
Apr 22, 2022
d3e5068
revamping design of ssl models, starting with logreg
Jul 29, 2022
3126ebe
adding save function to logreg class and renaming hyperopt.py
Aug 4, 2022
edcc56e
commenting logistic regression class and methods
Aug 12, 2022
bf630f4
scripts/utils.py pep8 changes
Aug 12, 2022
fd824dd
implementing LabelProp with hyperopt functionality
Aug 12, 2022
0c3ae2a
implementing co-training with hyperopt functionality
Aug 12, 2022
42f19f4
implementing Shadow fully-connected NN with hyperopt
Aug 12, 2022
a629bb3
implementing Shadow EAAT CNN with hyperopt
Aug 12, 2022
ebe247a
adding functions for pca analysis
Aug 12, 2022
7ae4671
rearranging model files
Aug 15, 2022
6997a6d
adding unit test for LogReg
Aug 15, 2022
73ce1f1
updating dependencies
Aug 15, 2022
98e33e8
correcting pytorch package name
Aug 15, 2022
12982ca
adding unit test for CoTraining
Aug 15, 2022
1365e30
adding unit test for LabelProp
Aug 15, 2022
c97136d
adding unit test for ShadowNN
Aug 15, 2022
554eb05
including utils scripts in unit tests coverage
Aug 15, 2022
20f768e
error: training NNs takes too long for a unit test, let alone hyperopt
Aug 15, 2022
5d17d8c
error: these cnns are so bad that they can't even make predictions
Aug 15, 2022
80d1e9b
correcting cnn parameter calculation to include max_pool1d
Aug 16, 2022
95ee61b
adding tests for more coverage
Aug 16, 2022
49ed669
adding a test for util plots
Aug 16, 2022
3cb9b44
adding seed test to co-training
Aug 16, 2022
c131dcf
removing old commented line
Aug 22, 2022
4c53820
changing fresh_start methods of models to use class train method instead
Sep 29, 2022
f0bccf1
adding an EarlyStopper class for managing that functionality
Oct 7, 2022
a094a25
adding cross validation implementation
Oct 10, 2022
be77146
investigating ray.tune for better hyperparameter optimization
Nov 1, 2022
6f98c99
refactoring hyperopt->raytune; todo: update test_models.py
Nov 2, 2022
269ecb1
fixing errors in unit tests for hyperopt->raytune
Nov 3, 2022
cbe5510
unifying .gitignore
Aug 8, 2023
0fc7e6e
parent be771462d0188b9f98fdf470907270929662b958
Oct 19, 2022
752f0ba
rearranging folders for relative importation
Aug 8, 2023
50c6942
functional implementation with extra args and unfinished checkpointing
Aug 10, 2023
d0fcf48
attempting to debug parallelized ray tune
Aug 18, 2023
fe970b3
HyperOpt working in serial
Aug 19, 2023
d84d9ab
abandoning ray in favor of hyperopt; checkpointing for refactor
Aug 21, 2023
33f4434
functioning hyperopt implementation
Aug 21, 2023
02e1da8
adding arg for storing trial results
Aug 21, 2023
a15d847
adding functionality for storing and restoring pre-existing trials
Aug 23, 2023
f77e0ca
correcting for output of SSLHyperOpt.py
Aug 29, 2023
c022d28
adding AdamW parameters to dry run
Aug 29, 2023
cc0a7b8
adding projection head hyperparameter optimization script
Sep 5, 2023
ca81247
chtc bugfixes
Sep 5, 2023
60fad56
removing extranneous -p
Sep 5, 2023
bab9621
adjusting other hyperparameter inputs
Sep 5, 2023
270fcdf
correcting import statements
Dec 21, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/python-package.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ jobs:
- name: Test with pytest
run: |
python3 -m pytest
python3 -m coverage run --source=./RadClass/ -m pytest
python3 -m coverage run --source=./RadClass/,./models/,./scripts/ -m pytest
python3 -m coverage report
python3 -m coverage html
COVERALLS_REPO_TOKEN=${{ secrets.COVERALLS_REPO_TOKEN }} python3 -m coveralls --service=github
7 changes: 7 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,10 @@ __pycache__
*.h5
*.ipynb
*.csv
*.joblib
*.log
*.png
*.pyc
results/
data/
checkpoint/
7 changes: 7 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,14 @@ Versions 3.6-3.9 are currently supported by tests. The following Python packages
* h5py
* numpy
* progressbar2
* matplotlib
* seaborn
* scipy
* sklearn
* hyperopt
* ray[tune]
* torch
* shadow-ssml

Modules can be imported from the repository directory (e.g. `from RadClass.H0 import H0`) or `RadClass` can be installed using pip:

Expand Down
170 changes: 170 additions & 0 deletions RadClass/models/LogReg.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,170 @@
# For hyperopt (parameter optimization)
from hyperopt import STATUS_OK
# sklearn models
from sklearn import linear_model
# diagnostics
from sklearn.metrics import balanced_accuracy_score, precision_score, recall_score
from scripts.utils import run_hyperopt
import joblib


class LogReg:
'''
Methods for deploying sklearn's logistic regression
implementation with hyperparameter optimization.
Data agnostic (i.e. user supplied data inputs).
TODO: Currently only supports binary classification.
Add multinomial functions and unit tests.
Add functionality for regression(?)
Inputs:
params: dictionary of logistic regression input functions.
keys max_iter, tol, and C supported.
alpha: float; weight for encouraging high recall
beta: float; weight for encouraging high precision
NOTE: if alpha=beta=0, default to favoring balanced accuracy.
random_state: int/float for reproducible intiailization.
'''

# only binary so far
def __init__(self, params=None, alpha=0, beta=0, random_state=0):
# defaults to a fixed value for reproducibility
self.random_state = random_state
# dictionary of parameters for logistic regression model
self.alpha, self.beta = alpha, beta
self.params = params
if self.params is None:
self.model = linear_model.LogisticRegression(
random_state=self.random_state
)
else:
self.model = linear_model.LogisticRegression(
random_state=self.random_state,
max_iter=params['max_iter'],
tol=params['tol'],
C=params['C']
)

def fresh_start(self, params, data_dict):
'''
Required method for hyperopt optimization.
Trains and tests a fresh logistic regression model
with given input parameters.
This method does not overwrite self.model (self.optimize() does).
Inputs:
params: dictionary of logistic regression input functions.
keys max_iter, tol, and C supported.
data_dict: compact data representation with the four requisite
data structures used for training and testing a model.
keys trainx, trainy, testx, and testy required.
'''

# unpack data
trainx = data_dict['trainx']
trainy = data_dict['trainy']
testx = data_dict['testx']
testy = data_dict['testy']

# supervised logistic regression
clf = LogReg(params=params, random_state=self.random_state)
# train and test model
clf.train(trainx, trainy)
# uses balanced_accuracy accounts for class imbalanced data
clf_pred, acc = clf.predict(testx, testy)
rec = recall_score(testy, clf_pred)
prec = precision_score(testy, clf_pred)

# loss function minimizes misclassification
# by maximizing metrics
return {'score': acc+(self.alpha*rec)+(self.beta*prec),
'loss': (1-acc) + self.alpha*(1-rec)+self.beta*(1-prec),
'model': clf,
'params': params,
'accuracy': acc,
'precision': prec,
'recall': rec}

def optimize(self, space, data_dict, max_evals=50, njobs=4, verbose=True):
'''
Wrapper method for using hyperopt (see utils.run_hyperopt
for more details). After hyperparameter optimization, results
are stored, the best model -overwrites- self.model, and the
best params -overwrite- self.params.
Inputs:
space: a raytune compliant dictionary with defined optimization
spaces. For example:
space = {'max_iter': tune.qrandint(10, 10000, 10),
'tol' : tune.loguniform(1e-5, 1e-1),
'C' : tune.uniform(0.001, 1000.0)
}
See hyperopt docs for more information.
data_dict: compact data representation with the four requisite
data structures used for training and testing a model.
keys trainx, trainy, testx, testy required.
max_evals: the number of epochs for hyperparameter optimization.
Each iteration is one set of hyperparameters trained
and tested on a fresh model. Convergence for simpler
models like logistic regression typically happens well
before 50 epochs, but can increase as more complex models,
more hyperparameters, and a larger hyperparameter space is tested.
njobs: (int) number of hyperparameter training iterations to complete
in parallel. Default is 4, but personal computing resources may
require less or allow more.
verbose: boolean. If true, print results of hyperopt.
If false, print only the progress bar for optimization.
'''

best, worst = run_hyperopt(space=space,
model=self.fresh_start,
data_dict=data_dict,
max_evals=max_evals,
njobs=njobs,
verbose=verbose)

# save the results of hyperparameter optimization
self.best = best
self.model = best['model']
self.params = best['params']
self.worst = worst

def train(self, trainx, trainy):
'''
Wrapper method for sklearn's logisitic regression training method.
Inputs:
trainx: nxm feature vector/matrix for training model.
trainy: nxk class label vector/matrix for training model.
'''

# supervised logistic regression
self.model.fit(trainx, trainy)

def predict(self, testx, testy=None):
'''
Wrapper method for sklearn's logistic regression predict method.
Inputs:
testx: nxm feature vector/matrix for testing model.
testy: nxk class label vector/matrix for training model.
optional: if included, the predicted classes -and-
the resulting classification accuracy will be returned.
'''

pred = self.model.predict(testx)

acc = None
if testy is not None:
# uses balanced_accuracy_score to account for class imbalance
acc = balanced_accuracy_score(testy, pred)

return pred, acc

def save(self, filename):
'''
Save class instance to file using joblib.
Inputs:
filename: string filename to save object to file under.
The file must be saved with extension .joblib.
Added to filename if not included as input.
'''

if filename[-7:] != '.joblib':
filename += '.joblib'
joblib.dump(self, filename)
Empty file.
Loading