Skip to content

bambooforest/salos

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

57 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Salos 2024 summer school: Linguistic data from fieldwork to R

Steven Moran and Alena Witzlack-Makarevich (24 July, 2024)


Overview

Here we provide course materials for the Salos 2024 summer school, “Linguistic data from fieldwork to R”:

The curriculum has three parts:

  1. Linguistic Fieldwork. Lecturer: dr. Maria Khachaturyan, University of Helsinki
  2. Data annotation. Lecturer: dr. Alena Witzlack-Makarevich, the Hebrew University of Jerusalem
  3. Quantitative methods. Lecturer: dr. Steven Moran, Université de Neuchâtel

This repository provides materials for (3).

Getting started

To get started with (3), please refer to:

and install the software (R, RStudio, spreadsheet application) on your personal computer.

We have shared with you some courses on DataCamp to get you up to speed with the basics of using spreadsheets (tabular data), visualizations, and R/RStudio.

R vs RStudio

R is the programming language and RStudio is an “integrated development environment”. In other words, it’s an add-on to R, which provides a “user friendly” graphical interface to R.

R vs RStudio.

R vs RStudio.

The basic layout of RStudio.

R vs RStudio.

R vs RStudio.

Once you have the software R and the RStudio graphical user interface (GUI) installed, you can begin to explore the functionalities of R!

R vs R packages (aka libraries)

So-called “Base R” contains the “core packages” or the R programming. These include for example:

  • Using R as a calculator
  • Using basic core fuctions like read.csv(), summary(), table(), head()

There are many more functions available from “CRAN” – the Comprehensive R Archive Network.

Here is a visualization of core R vs R packages:

R vs RStudio.

R vs RStudio.

Why do the packages not in Core R need to be installed (once) individually and separately?

**Because there are over **23,000** packages in R.**

You don’t want all that on your computer!!!


Any introduction to R should explain how to install and load R libraries.

The basics are:

  1. Use the function install.packages('your-packages-name') to install the R library (aka package) once

  2. Each type you create a report (or script) that has code from that R package/library, you have to load it like this: library(your-package-name)

Yes, it is confusing of when you need to use quotes (’ ’) and when you don’t need to. As with any programming language, or “syntax”, there are simply some things that are idiosyncratic (imagine the engineers or developers arguing about one way or another to do something).

For any programming language idiosyncrasies, the user simply has to either:

  1. Remember them.

  2. Look them up (e.g., try googling it).


Now for an example. If we want to install a package / library, we can use something like this (without the “#” – this is a special character to tell R that this line of code is commented out, i.e., it should not be run):

#install.packages('tidyverse')
#install.packages('knitr')

Here we note we have prepended the line of code with “#”, which tells R: do not run this code! Why? Because we don’t want to run this line of code every time we run this file. Install packages once! You can do this by uncommented the code above (remove the “#”) and run the code chunk.

Once installed, you can always load the library (aka package) from any script, within RStudio, etc.

library(tidyverse)
library(knitr)

RMarkdown

We will be using RMarkdown, which is an authoring framework for data science.

It combines R code with R Markdown, a “language” for creating formatted text, into the same document.

Here is an example:

R and RMarkdown example.

R and RMarkdown example.

Why do we take this approach? Because we want to produce reproducible research. And one way of doing that is to have the document and the code in the same scientific report.

In RMarkdown, the file extension is .Rmd. Together with the knitr package (aka library) and RStudio, this report that you’re reading “compiles” this R Markdown file (.Rmd) file into a Markdown file (.md) file that displays nicely in GitHub.

You can also compile as PDF. Or as HTML. Or as lots of other files types! And you can do so easily, by changing the header at the top of this file.

Getting up to speed

Getting up to speed (including the DataCamp exercises) involves understanding:

  • Table data
  • How to load, access, and manipulate data
  • Data types (in programming / R vs statistics)

Data types are important because they help you decide which data visualizations and statistical tests you can use (cheat sheets below).

To get up to speed, please refer to:

Cheat sheets

RMarkdown

Here are some RMarkdown cheat sheets:

And here is some more advanced explanation on writing scientific reports in RMarkdown:

Data visualizations

Here is a cheat sheet for choosing the right visualization:

And some more advanced explanation on which types of plots to use:

Choosing the right statistic

Here are some cheat sheets on how to choose the right statistical test:

And some more advanced explanation on the process:

For the self tutoring session

Beginners

Understanding Data Visualization

Introduction to the Tidyverse

Introduction to Data Visualization with ggplot2

Advanced

Intermediate Data Visualization with ggplot2

Categorical Data in the Tidyverse

Foundations of Inference in R

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages