README.md~

# FYS-STK3155/4155 Applied Data Analysis and Machine Learning, http://www.uio.no/studier/emner/matnat/fys/FYS-STK4155/index-eng.html


This site contains all material relevant for the course on Data Analysis and Machine Learning FYS-STK3155/4155. 

## Course content

Probability theory and statistical methods play a central role in science. Nowadays we are
surrounded by huge amounts of data. For example, there are about one trillion web pages; more than one
hour of video is uploaded to YouTube every second, amounting to 10 years of content every
day; the genomes of 1000s of people, each of which has a length of more than a billion  base pairs, have
been sequenced by various labs and so on.
This deluge of data calls for automated methods of data analysis,
which is exactly what machine
learning provides. In this course the approach is to define machine learning as a set of methods that can
automatically detect patterns in data, and then use the uncovered patterns to predict future
data, or to perform other kinds of decision making under uncertainty. Since many of these problems can be studied using
tools of probability theory, the aim of this course is to expose you to central methods in probability theory linked with machine learning.

This course covers thus topics like Monte Carlo methods and Markov chains, Bayesian statistics, error estimates, various regression methods, optimization of data and error analysis and central algorithms in machine learning.
The course has several numerical projects and numerical exercises that are meant to illustrate the theory.


## Learning outcomes

The course introduces a variety of central algorithms and methods
essential for studies of data analysis and machine learning. The course is project based and through the various projects, normally three, the students will be exposed to fundamental research problems in these fields, with the aim to reproduce state of the art scientific results. Both supervised and unsupervised methods will be covered. You will learn to develop and structure large codes for studying these systems, get acquainted with computing facilities and learn to handle large scientific projects. A good scientific and ethical conduct is emphasized throughout the course. More specifically, after this course you will

- Learn about basic data analysis, statistical analysis, Monte Carlo sampling, data optimization and machine learning;
- Be capable of extending the acquired knowledge to other systems and cases;
- Have an understanding of central algorithms used in data analysis and machine learning;
- Gain knowledge of central aspects of Monte Carlo methods, Markov chains, Gibbs samplers and their possible applications;
- Understand linear methods for regression and classification, from ordinary least squares, via Lasso and Ridge to Logistic regression;
- Learn about various neural networks and deep  learning methods for supervised and unsupervised learning;
- Learn about about decision trees and random forests
- Learn about support vector machines and kernel transformations
- Reduction of data sets, from PCA to clustering, supervised and unsupervided methods
- Work on numerical projects to illustrate the theory. The projects play a central role and students are expected to know modern programming languages like Python or C++.  

## Prerequisites

Basic knowledge in programming and mathematics, with an emphasis on linear algebra. Knowledge of Python or/and C++ as programming languages is required and experience with Jupiter notebook is recommended. Required courses are the equivalents to the University of Oslo mathematics courses MAT1100, MAT1110, MAT1120 and at least one of the corresponding computing and programming courses INF1000/INF1110 or MAT-INF1100/MAT-INF1100L/BIOS1100/KJM-INF1100. 


## The course has two central parts

1. Statistical analysis and optimization of data
2. Machine learning

### Statistical analysis and optimization of data

The following topics will be covered
- Basic concepts, expectation values, variance, covariance, correlation functions and errors;
- Simpler models, binomial distribution, the Poisson distribution, simple and multivariate normal distributions;
- Central elements of Bayesian statistics and modeling;
- Central elements from linear algebra
- Gradient methods for data optimization
- Monte Carlo methods, Markov chains, Metropolis-Hastings algorithm;
- Linear methods for regression and classification;
- Estimation of errors using cross-validation, blocking, bootstrapping and jackknife methods;
- Practical optimization using Singular-value decomposition and least squares for parameterizing data.


### Machine learning, mainly supervised learning

The following topics will be covered
- Linear Regression and Logistic Regression;
- Neural networks and deep learning;
- Decisions trees and nearest neighbor algorithms
- Support vector machines

All the above topics will be supported by examples, hands-on exercises and project work.

Computational aspects play a central role and the students are
expected to work on numerical examples and projects which illustrate
the theory and methods. Some of the projects can be coordinated with the high-performance programming course IN4200. 


## Practicalities

1. Four lectures per week, Fall semester, 10 ECTS;
2. Four hours of laboratory sessions for work on computational projects;
3. Three projects which are graded and count 1/3 each of the final grade;
4. A selected number of weekly assignments;
6. The course is part of the CS Master of Science program, but is open to other bachelor and Master of Science students at the University of Oslo;
7. Grading scale: Grades are awarded on a scale from A to F, where A is the best grade and F is a fail;
8. The course will be offered as a FYS-MAT4155 (Master of Science level) and a FYS-MAT3155 (senior undergraduate) course.

## Possible textbooks

_Recommended textbooks_:
- Trevor Hastie, Robert Tibshirani, Jerome H. Friedman, The Elements of Statistical Learning, Springer
- Aurelien Geron, Hands‑On Machine Learning with Scikit‑Learn and TensorFlow, O'Reilly

_General learning book on statistical analysis_:
- Christian Robert and George Casella, Monte Carlo Statistical Methods, Springer
- Peter Hoff, A first course in Bayesian statistical models, Springer

_General Machine Learning Books_:
- Kevin Murphy, Machine Learning: A Probabilistic Perspective, MIT Press
- Christopher M. Bishop, Pattern Recognition and Machine Learning, Springer
- David J.C. MacKay, Information Theory, Inference, and Learning Algorithms, Cambridge University Press
- David Barber, Bayesian Reasoning and Machine Learning, Cambridge University Press 

## Links to relevant courses at the University of Oslo
The link here https://www.mn.uio.no/english/research/about/centre-focus/innovation/data-science/studies/  gives an excellent overview of courses on Machine learning at UiO.

- _STK2100 Machine learning and statistical methods for prediction and classification_ http://www.uio.no/studier/emner/matnat/math/STK2100/index-eng.html. 
- _IN3050 Introduction to Artificial Intelligence and Machine Learning_ https://www.uio.no/studier/emner/matnat/ifi/IN3050/index-eng.html. Introductory course in machine learning and AI with an algorithmic approach. 
- _STK-INF3000/4000 Selected Topics in Data Science_ http://www.uio.no/studier/emner/matnat/math/STK-INF3000/index-eng.html. The course provides insight into selected contemporary relevant topics within Data Science. 
- _IN4080 Natural Language Processing_ https://www.uio.no/studier/emner/matnat/ifi/IN4080/index.html. Probabilistic and machine learning techniques applied to natural language processing. 
- _STK-IN4300 Statistical learning methods in Data Science_ https://www.uio.no/studier/emner/matnat/math/STK-IN4300/index-eng.html. An advanced introduction to statistical and machine learning. For students with a good mathematics and statistics background.
- _INF4490 Biologically Inspired Computing_ http://www.uio.no/studier/emner/matnat/ifi/INF4490/. An introduction to self-adapting methods also called artificial intelligence or machine learning. 
- _IN-STK5000  Adaptive Methods for Data-Based Decision Making_ https://www.uio.no/studier/emner/matnat/ifi/IN-STK5000/index-eng.html. Methods for adaptive collection and processing of data based on machine learning techniques. 
- _IN5400/INF5860 Machine Learning for Image Analysis_ https://www.uio.no/studier/emner/matnat/ifi/IN5400/. An introduction to deep learning with particular emphasis on applications within Image analysis, but useful for other application areas too.
- _TEK5040 Deep learning for autonomous systems_ https://www.uio.no/studier/emner/matnat/its/TEK5040/. The course addresses advanced algorithms and architectures for deep learning with neural networks. The course provides an introduction to how deep-learning techniques can be used in the construction of key parts of advanced autonomous systems that exist in physical environments and cyber environments.
- _STK4051 Computational Statistics_ https://www.uio.no/studier/emner/matnat/math/STK4051/index-eng.html
- _STK4021 Applied Bayesian Analysis and Numerical Methods_ https://www.uio.no/studier/emner/matnat/math/STK4021/index-eng.html