rspark-tutorial
provides content illustrating the use of rspark
, a Docker-based computing environment. rspark
runs R, RStudio, PostgreSQL, Hadoop, Hive, and Spark in containers on your local computer (or optionally on AWS). The content covers a range of topics including many aspects of the tidyverse
, machine learning using Spark, etc.
This tutorial is meant to run in conjunction with rspark-docker
, which contains images of the rspark
components. The steps for installing and launching the rspark-docker
containers is given here:
https://github.com/jharner/rspark-docker
To get access to the tutorial content within rsaprk-docker
, do the following:
- Make sure
git
is installed and up to date. This step should have been done when you clonedrspark-docker
, but from the command line you can run:
git version
Compare the returned version with the current version. See the directions in the rspark-docker
README for installing or updating, if necessary.
- Issue the following command in a terminal to clone
rspark-tutorial
:
git clone https://github.com/jharner/rspark-tutorial.git
This will create a directory (folder) in your home directory by default containing the local git
repo of rspark-tutorial
. If you prefer installing rspark-tutorial
in another directory, cd
there first.
Assuming that rspark-docker
has been started and that you have logged into the RStudio container:
-
Zip the
rspark-tutorial
folder on your local computer. -
Click on the
Files
tab in the RStudio container and then click theUpload
menu item. Navigate to the zipped version ofrspark-tutorial
and upload. -
Execute the
.Rmd
files in the various modules/sections to generate R notebooks or R markdown output files.
Once you have cloned rspark-tutorial
, you will be able to pull updates by issuing the following command from within the rspark-tutorial
directory (folder):
cd rspark-docker
git pull origin master
The git
pulls will keep your tutorial up to date.
If you want an excellent Git GUI, then download and install: Sourcetree.