Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update starting.html #99

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/starting.html
Original file line number Diff line number Diff line change
Expand Up @@ -418,7 +418,7 @@ <h1><span class="header-section-number">Chapter 2</span> Getting Started</h1>
<p>In <a href="analysis.html#analysis">Chapter 3</a> we dive into analysis followed by modeling, which presents examples using a single-cluster machine: your personal computer. Subsequent chapters introduce cluster computing and the concepts and techniques that you’ll need to successfully run code across multiple machines.</p>
<div id="overview" class="section level2">
<h2><span class="header-section-number">2.1</span> Overview</h2>
<p>From<!--((("getting started", "overview of")))--> R, getting started with Spark using <code>sparklyr</code> and a local cluster is as easy as installing and loading the <code>sparklyr</code> package followed by installing Spark using <code>sparklyr</code> however, we assume you are starting with a brand new computer running Windows, macOS, or Linux, so we’ll walk you through the prerequisites before connecting to a local Spark cluster.</p>
<p>From<!--((("getting started", "overview of")))--> R, getting started with Spark using <code>sparklyr</code> and a local cluster is as easy as installing and loading the <code>sparklyr</code> package followed by installing Spark using <code>sparklyr</code>. However, we assume you are starting with a brand new computer running Windows, macOS, or Linux, so we’ll walk you through the prerequisites before connecting to a local Spark cluster.</p>
<p>Although this chapter is designed to help you get ready to use Spark on your personal computer, it’s also likely that some readers will already have a Spark cluster<!--((("Apache Spark", "online clusters")))((("clusters", "connecting to online")))--> available or might prefer to get started with an online Spark cluster. For instance, Databricks<!--((("Databricks")))((("cloud computing", "Databricks")))--> hosts a <a href="http://bit.ly/31MfKuV">free community edition</a> of Spark that you can easily access from your web browser. If you end up choosing this path, skip to <a href="#prerequistes">Prerequisites</a>, but make sure you consult the proper resources for your existing or online Spark cluster.</p>
<p>Either way, after you are done with the prerequisites, you will first learn how to connect to Spark. We then present the most important tools and operations that you’ll use throughout the rest of this book. Less emphasis is placed on teaching concepts or how to use them—we can’t possibly explain modeling or streaming in a single chapter. However, going through this chapter should give you a brief glimpse of what to expect and give you the confidence that you have the tools correctly configured to tackle more challenging problems later on.</p>
<p>The tools you’ll use are mostly divided into R code and the Spark web interface. All Spark operations are run from R; however, monitoring execution of distributed operations is performed from Spark’s web interface, which you can load from any web browser. We then disconnect from this local cluster, which is easy to forget to do but highly recommended while working with local clusters—and in shared Spark clusters as well!</p>
Expand Down