Predicting new particle formation events with Machine Learning

The course this project is done for:

University of Helsinki
Introduction to Machine Learning
Fall 2020 Term Project

Project members:

Bernardo Williams (GitHub: williwilliams3)
Julia Sanders (GitHub: julia-sand)
Mikko Saukkoriipi (GitHub: Saukkoriipi)

Where to find full project report:

Full project report with all the details can be found from: Project_report_and_presentation/Project_report.pdf

Objective:

To create the best possible machine learning model to predict potential new particle formation events based on the 100 daily features.

Abstract of project:

Several machine learning classification models were used over the dataset npf_train.csv divided with the purpose of predicting a binary and a multi-class label. The objective was to extend the model and predictions to unseen data, and also to give an estimate of the accuracy the model would have on the unseen data.

For fitting the models we used two data reduction techniques, PCA and bestK feature selection and two normalization methods, min-max and standardizing normalization. We tried fitting algorithmic, generative and discriminative methods using either validation or cross validation to measure accuracy for both the binary and multiclass classifiers and found which ones performed the best in terms of accuracy over an unbiased test set. Lastly, we found that taking the average prediction of the best algorithmic, discriminative and generative methods gave estimates with higher accuracy and more consistent accuracy over train, validation and test.

Final accuracies for the binary class models

Accuracy	DT Binary	RF Binary	XGB Binary	KNN Binary	Log Binary	NB PCA	SVM	Ensamble
Training	88%	100%	100%	85%	86%	84%	98%	96%
Validation	84%	87%	90%	78%	85%	87%	90%	96%
Test	88%	88%	87%	80%	85%	93%	83%	92%

Final accuracies for the multi-classification models

Accuracy	DT Multiclass	RF Multiclass	XGB Multiclass	KNN Multiclass	NB PCA	SVM	Ensamble
Training	66%	100%	100%	66%	69%	83%	94%
Validation	64%	66%	70%	57.7%	62%	69%	98%
Test	67%	72%	70%	57.7%	65%	68%	70%

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
Project_report_and_presentation		Project_report_and_presentation
bin		bin
data		data
prediction_results		prediction_results
DecisionTreeFit_CV.ipynb		DecisionTreeFit_CV.ipynb
KNN (1).ipynb		KNN (1).ipynb
KNN.ipynb		KNN.ipynb
KNN_archive.ipynb		KNN_archive.ipynb
LDA_&_QDA.ipynb		LDA_&_QDA.ipynb
LogisticRegression_report.ipynb		LogisticRegression_report.ipynb
Logistic_Regression_Stratified (1).ipynb		Logistic_Regression_Stratified (1).ipynb
Logistic_Regression_Stratified.ipynb		Logistic_Regression_Stratified.ipynb
ModelEnsamble.ipynb		ModelEnsamble.ipynb
README.md		README.md
RF_Fit_OOB.ipynb		RF_Fit_OOB.ipynb
SVM_Fit_CV.ipynb		SVM_Fit_CV.ipynb
XGB_CV.ipynb		XGB_CV.ipynb
answers_questions.ipynb		answers_questions.ipynb
data_cleaner.py		data_cleaner.py
naive_bayes_PCA.ipynb		naive_bayes_PCA.ipynb
naive_bayes_kBest.ipynb		naive_bayes_kBest.ipynb
npf_test.csv		npf_test.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predicting new particle formation events with Machine Learning

The course this project is done for:

Project members:

Where to find full project report:

Objective:

Abstract of project:

Final accuracies for the binary class models

Final accuracies for the multi-classification models

About

Releases

Packages

Languages

Saukkoriipi/Predicting_NPF_Event

Folders and files

Latest commit

History

Repository files navigation

Predicting new particle formation events with Machine Learning

The course this project is done for:

Project members:

Where to find full project report:

Objective:

Abstract of project:

Final accuracies for the binary class models

Final accuracies for the multi-classification models

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages