SSA SCV Tutorial: Visualising High-dimensional Data with R

Website: https://StatSocAus.github.io/tutorial_highd_vis

This is for scientists and data science practitioners who regularly work with high-dimensional data and models and are interested in learning how to better visualise them. You will learn about recognising structure in high-dimensional data, including clusters, outliers, non-linear relationships, and how this can be used with methods such as supervised classification, cluster analysis and non-linear dimension reduction.

Background: Participants should have a good working knowledge of R, and some background in multivariate statistical methods and/or data mining techniques.

Presenter: Dianne Cook is Professor of Statistics at Monash University in Melbourne, Australia. She is a world leader in data visualisation, especially the visualisation of high-dimensional data using tours with low-dimensional projections, and projection pursuit. She also works on bridging the gap between exploratory graphics and statistical inference. Di is a Fellow of the American Statistical Association, past editor of the Journal of Computational and Graphical Statistics, and the R Journal, elected Ordinary Member of the R Foundation, and elected member of the International Statistical Institute.

Structure of tutorial

Background: Participants should have a good working knowledge of R, and some background in multivariate statistical methods and/or data mining techniques.

time	topic
1:00-1:20	Introduction: What is high-dimensional data, why visualise and overview of methods
1:20-1:45	Basics of linear projections, and recognising high-d structure
1:45-2:30	Effectively reducing your data dimension, in association with non-linear dimension reduction
2:30-3:00	BREAK
3:00-3:45	Understanding clusters in data using visualisation
3:45-4:30	Building better classification models with visual input

Session 1 Slides

Session 2 Slides

Zip file of materials

Getting started

You should have a reasonably up to date version of R and R Studio, eg RStudio RStudio 2023.06.2 +561 and R version 4.3.1 (2023-06-16). Install the following packages, and their dependencies.

install.packages(c("readr", "tidyr", "dplyr", "ggplot2", "tourr", "mulgar", "geozoo", "detourr", "palmerpenguins", "GGally", "MASS", "randomForest", "mclust", "crosstalk", "plotly", "viridis", "conflicted"), dependencies=c("Depends", "Imports"))

Ideally, you install this package from GitHub:

remotes::install_github("casperhart/detourr")

Download the Zip file of materials to your laptop, and unzip it.
Download just the R scripts, slides1.R, slides2.R
Open your RStudio be clicking on tutorial.Rproj.

GitHub repo with all materials is https://statsocaus.github.io/tutorial_highd_vis/.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
data		data
gifs		gifs
images		images
slides1_files		slides1_files
slides2_files		slides2_files
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SCV3.png		SCV3.png
_config.yml		_config.yml
slides1.R		slides1.R
slides1.html		slides1.html
slides1.pdf		slides1.pdf
slides1.qmd		slides1.qmd
slides2.R		slides2.R
slides2.html		slides2.html
slides2.pdf		slides2.pdf
slides2.qmd		slides2.qmd
tutorial.Rproj		tutorial.Rproj
tutorial.zip		tutorial.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SSA SCV Tutorial: Visualising High-dimensional Data with R

Structure of tutorial

Getting started

About

Releases

Packages

Languages

License

StatSocAus/tutorial_highd_vis

Folders and files

Latest commit

History

Repository files navigation

SSA SCV Tutorial: Visualising High-dimensional Data with R

Structure of tutorial

Getting started

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages