Steven Moran and Alena Witzlack-Makarevich (24 July, 2024)
- Overview
- Getting started
- R vs RStudio
- R vs R packages (aka libraries)
- RMarkdown
- Getting up to speed
- Cheat sheets
- For the self tutoring session
Here we provide course materials for the Salos 2024 summer school, “Linguistic data from fieldwork to R”:
The curriculum has three parts:
- Linguistic Fieldwork. Lecturer: dr. Maria Khachaturyan, University of Helsinki
- Data annotation. Lecturer: dr. Alena Witzlack-Makarevich, the Hebrew University of Jerusalem
- Quantitative methods. Lecturer: dr. Steven Moran, Université de Neuchâtel
This repository provides materials for (3).
To get started with (3), please refer to:
and install the software (R, RStudio, spreadsheet application) on your personal computer.
We have shared with you some courses on DataCamp to get you up to speed with the basics of using spreadsheets (tabular data), visualizations, and R/RStudio.
R is the programming language and RStudio is an “integrated development environment”. In other words, it’s an add-on to R, which provides a “user friendly” graphical interface to R.
R vs RStudio.The basic layout of RStudio.
R vs RStudio.Once you have the software R and the RStudio graphical user interface (GUI) installed, you can begin to explore the functionalities of R!
So-called “Base R” contains the “core packages” or the R programming. These include for example:
- Using R as a calculator
- Using basic core fuctions like
read.csv()
,summary()
,table()
,head()
There are many more functions available from “CRAN” – the Comprehensive R Archive Network.
Here is a visualization of core R vs R packages:
R vs RStudio.Why do the packages not in Core R need to be installed (once) individually and separately?
**Because there are over **23,000** packages in R.**
You don’t want all that on your computer!!!
Any introduction to R should explain how to install and load R libraries.
The basics are:
-
Use the function
install.packages('your-packages-name')
to install the R library (aka package) once -
Each type you create a report (or script) that has code from that R package/library, you have to load it like this:
library(your-package-name)
Yes, it is confusing of when you need to use quotes (’ ’) and when you don’t need to. As with any programming language, or “syntax”, there are simply some things that are idiosyncratic (imagine the engineers or developers arguing about one way or another to do something).
For any programming language idiosyncrasies, the user simply has to either:
-
Remember them.
-
Look them up (e.g., try googling it).
Now for an example. If we want to install a package / library, we can use something like this (without the “#” – this is a special character to tell R that this line of code is commented out, i.e., it should not be run):
#install.packages('tidyverse')
#install.packages('knitr')
Here we note we have prepended the line of code with “#”, which tells R: do not run this code! Why? Because we don’t want to run this line of code every time we run this file. Install packages once! You can do this by uncommented the code above (remove the “#”) and run the code chunk.
Once installed, you can always load the library (aka package) from any script, within RStudio, etc.
library(tidyverse)
library(knitr)
We will be using RMarkdown, which is an authoring framework for data science.
It combines R code with R Markdown, a “language” for creating formatted text, into the same document.
Here is an example:
R and RMarkdown example.Why do we take this approach? Because we want to produce reproducible research. And one way of doing that is to have the document and the code in the same scientific report.
In RMarkdown, the file extension is
.Rmd. Together with the knitr
package (aka library) and RStudio, this report that you’re reading
“compiles” this R Markdown
file (.Rmd) file into a Markdown
file (.md) file that displays
nicely in GitHub.
You can also compile as PDF. Or as HTML. Or as lots of other files types! And you can do so easily, by changing the header at the top of this file.
Getting up to speed (including the DataCamp exercises) involves understanding:
- Table data
- How to load, access, and manipulate data
- Data types (in programming / R vs statistics)
Data types are important because they help you decide which data visualizations and statistical tests you can use (cheat sheets below).
To get up to speed, please refer to:
Here are some RMarkdown cheat sheets:
And here is some more advanced explanation on writing scientific reports in RMarkdown:
Here is a cheat sheet for choosing the right visualization:
And some more advanced explanation on which types of plots to use:
Here are some cheat sheets on how to choose the right statistical test:
And some more advanced explanation on the process:
Understanding Data Visualization
Introduction to the Tidyverse
Introduction to Data Visualization with ggplot2
Intermediate Data Visualization with ggplot2
Categorical Data in the Tidyverse
Foundations of Inference in R