Salos 2024 summer school: Linguistic data from fieldwork to R

Steven Moran and Alena Witzlack-Makarevich (24 July, 2024)

Overview
Getting started
R vs RStudio
R vs R packages (aka libraries)
RMarkdown
Getting up to speed
Cheat sheets
For the self tutoring session
- Beginners
- Advanced

Overview

Here we provide course materials for the Salos 2024 summer school, “Linguistic data from fieldwork to R”:

https://www.academiasalensis.org/en/conference-and-summer-school/2024-summer-school/

The curriculum has three parts:

Linguistic Fieldwork. Lecturer: dr. Maria Khachaturyan, University of Helsinki
Data annotation. Lecturer: dr. Alena Witzlack-Makarevich, the Hebrew University of Jerusalem
Quantitative methods. Lecturer: dr. Steven Moran, Université de Neuchâtel

This repository provides materials for (3).

Getting started

To get started with (3), please refer to:

Getting started

and install the software (R, RStudio, spreadsheet application) on your personal computer.

We have shared with you some courses on DataCamp to get you up to speed with the basics of using spreadsheets (tabular data), visualizations, and R/RStudio.

R vs RStudio

R is the programming language and RStudio is an “integrated development environment”. In other words, it’s an add-on to R, which provides a “user friendly” graphical interface to R.

https://moderndive.com/1-getting-started.html

R vs RStudio.

The basic layout of RStudio.

R vs RStudio.

Once you have the software R and the RStudio graphical user interface (GUI) installed, you can begin to explore the functionalities of R!

R vs R packages (aka libraries)

So-called “Base R” contains the “core packages” or the R programming. These include for example:

Using R as a calculator
Using basic core fuctions like read.csv(), summary(), table(), head()

There are many more functions available from “CRAN” – the Comprehensive R Archive Network.

Here is a visualization of core R vs R packages:

R vs RStudio.

Why do the packages not in Core R need to be installed (once) individually and separately?

**Because there are over **23,000** packages in R.**

You don’t want all that on your computer!!!

Any introduction to R should explain how to install and load R libraries.

The basics are:

Use the function install.packages('your-packages-name') to install the R library (aka package) once
Each type you create a report (or script) that has code from that R package/library, you have to load it like this: library(your-package-name)

Yes, it is confusing of when you need to use quotes (’ ’) and when you don’t need to. As with any programming language, or “syntax”, there are simply some things that are idiosyncratic (imagine the engineers or developers arguing about one way or another to do something).

For any programming language idiosyncrasies, the user simply has to either:

Remember them.
Look them up (e.g., try googling it).

Now for an example. If we want to install a package / library, we can use something like this (without the “#” – this is a special character to tell R that this line of code is commented out, i.e., it should not be run):

#install.packages('tidyverse')
#install.packages('knitr')

Here we note we have prepended the line of code with “#”, which tells R: do not run this code! Why? Because we don’t want to run this line of code every time we run this file. Install packages once! You can do this by uncommented the code above (remove the “#”) and run the code chunk.

Once installed, you can always load the library (aka package) from any script, within RStudio, etc.

library(tidyverse)
library(knitr)

RMarkdown

We will be using RMarkdown, which is an authoring framework for data science.

It combines R code with R Markdown, a “language” for creating formatted text, into the same document.

Here is an example:

R and RMarkdown example.

Why do we take this approach? Because we want to produce reproducible research. And one way of doing that is to have the document and the code in the same scientific report.

In RMarkdown, the file extension is .Rmd. Together with the knitr package (aka library) and RStudio, this report that you’re reading “compiles” this R Markdown file (.Rmd) file into a Markdown file (.md) file that displays nicely in GitHub.

You can also compile as PDF. Or as HTML. Or as lots of other files types! And you can do so easily, by changing the header at the top of this file.

Getting up to speed

Getting up to speed (including the DataCamp exercises) involves understanding:

Table data
How to load, access, and manipulate data
Data types (in programming / R vs statistics)

Data types are important because they help you decide which data visualizations and statistical tests you can use (cheat sheets below).

To get up to speed, please refer to:

Getting up to speed

Cheat sheets

RMarkdown

Here are some RMarkdown cheat sheets:

And here is some more advanced explanation on writing scientific reports in RMarkdown:

https://github.com/bambooforest/APY313/tree/main/2_writing_scientific_reports

Data visualizations

Here is a cheat sheet for choosing the right visualization:

And some more advanced explanation on which types of plots to use:

https://github.com/bambooforest/IntroDataScience/tree/main/6_data_visualization#which-plots-to-use

Choosing the right statistic

Here are some cheat sheets on how to choose the right statistical test:

And some more advanced explanation on the process:

https://github.com/bambooforest/APY313/tree/main/7_data_modeling

For the self tutoring session

Beginners

Understanding Data Visualization

https://app.datacamp.com/learn/courses/understanding-data-visualization

Introduction to the Tidyverse

https://app.datacamp.com/learn/courses/introduction-to-the-tidyverse

Introduction to Data Visualization with ggplot2

https://app.datacamp.com/learn/courses/introduction-to-data-visualization-with-ggplot2

Advanced

Intermediate Data Visualization with ggplot2

https://app.datacamp.com/learn/courses/intermediate-data-visualization-with-ggplot2

Categorical Data in the Tidyverse

https://app.datacamp.com/learn/courses/categorical-data-in-the-tidyverse

Foundations of Inference in R

https://app.datacamp.com/learn/courses/foundations-of-inference-in-r

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
BivalTyp_files/figure-gfm		BivalTyp_files/figure-gfm
Data_modeling_files/figure-gfm		Data_modeling_files/figure-gfm
Phoible_visualizations_files/figure-gfm		Phoible_visualizations_files/figure-gfm
Visualizations_files/figure-gfm		Visualizations_files/figure-gfm
Warriner2013_files/figure-gfm		Warriner2013_files/figure-gfm
Who_knows_what_files/figure-gfm		Who_knows_what_files/figure-gfm
data		data
figures		figures
BivalTyp.Rmd		BivalTyp.Rmd
BivalTyp.md		BivalTyp.md
Data_modeling.Rmd		Data_modeling.Rmd
Data_modeling.md		Data_modeling.md
Getting_started.Rmd		Getting_started.Rmd
Getting_started.md		Getting_started.md
Getting_up_to_speed.Rmd		Getting_up_to_speed.Rmd
Getting_up_to_speed.md		Getting_up_to_speed.md
Phoible.Rmd		Phoible.Rmd
Phoible.md		Phoible.md
Phoible_visualizations.Rmd		Phoible_visualizations.Rmd
Phoible_visualizations.html		Phoible_visualizations.html
Phoible_visualizations.md		Phoible_visualizations.md
README.Rmd		README.Rmd
README.md		README.md
Visualizations.Rmd		Visualizations.Rmd
Visualizations.md		Visualizations.md
Warriner2013.Rmd		Warriner2013.Rmd
Warriner2013.md		Warriner2013.md
Who_knows_what.Rmd		Who_knows_what.Rmd
Who_knows_what.md		Who_knows_what.md
athletes.Rmd		athletes.Rmd
athletes.md		athletes.md
data_for_download.csv		data_for_download.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Salos 2024 summer school: Linguistic data from fieldwork to R

Overview

Getting started

R vs RStudio

R vs R packages (aka libraries)

RMarkdown

Getting up to speed

Cheat sheets

RMarkdown

Data visualizations

Choosing the right statistic

For the self tutoring session

Beginners

Advanced

About

Releases

Packages

Contributors 2

Languages

bambooforest/salos

Folders and files

Latest commit

History

Repository files navigation

Salos 2024 summer school: Linguistic data from fieldwork to R

Overview

Getting started

R vs RStudio

R vs R packages (aka libraries)

RMarkdown

Getting up to speed

Cheat sheets

RMarkdown

Data visualizations

Choosing the right statistic

For the self tutoring session

Beginners

Advanced

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages