diff --git a/docs/starting.html b/docs/starting.html index 3a27505..09bb83a 100644 --- a/docs/starting.html +++ b/docs/starting.html @@ -418,7 +418,7 @@

Chapter 2 Getting Started

In Chapter 3 we dive into analysis followed by modeling, which presents examples using a single-cluster machine: your personal computer. Subsequent chapters introduce cluster computing and the concepts and techniques that you’ll need to successfully run code across multiple machines.

2.1 Overview

-

From R, getting started with Spark using sparklyr and a local cluster is as easy as installing and loading the sparklyr package followed by installing Spark using sparklyr however, we assume you are starting with a brand new computer running Windows, macOS, or Linux, so we’ll walk you through the prerequisites before connecting to a local Spark cluster.

+

From R, getting started with Spark using sparklyr and a local cluster is as easy as installing and loading the sparklyr package followed by installing Spark using sparklyr. However, we assume you are starting with a brand new computer running Windows, macOS, or Linux, so we’ll walk you through the prerequisites before connecting to a local Spark cluster.

Although this chapter is designed to help you get ready to use Spark on your personal computer, it’s also likely that some readers will already have a Spark cluster available or might prefer to get started with an online Spark cluster. For instance, Databricks hosts a free community edition of Spark that you can easily access from your web browser. If you end up choosing this path, skip to Prerequisites, but make sure you consult the proper resources for your existing or online Spark cluster.

Either way, after you are done with the prerequisites, you will first learn how to connect to Spark. We then present the most important tools and operations that you’ll use throughout the rest of this book. Less emphasis is placed on teaching concepts or how to use them—we can’t possibly explain modeling or streaming in a single chapter. However, going through this chapter should give you a brief glimpse of what to expect and give you the confidence that you have the tools correctly configured to tackle more challenging problems later on.

The tools you’ll use are mostly divided into R code and the Spark web interface. All Spark operations are run from R; however, monitoring execution of distributed operations is performed from Spark’s web interface, which you can load from any web browser. We then disconnect from this local cluster, which is easy to forget to do but highly recommended while working with local clusters—and in shared Spark clusters as well!