ray-project · harsha-simhadri · Jul 10, 2016
diff --git a/doc/basic-usage.md b/doc/basic-usage.md
@@ -1,5 +1,16 @@
 ## Basic Usage
 
+The Ray system consists of a driver instance that takes commands from
+a Python prompt, and one or more worker nodes. Each worker node has an
+unique object store and multiple worker processers (typically one for
+each core). The Python prompt takes regular Python commands and
+executes them on the driver instance. The Ray system augments the
+prompt with an interface that takes **remote functions** which are
+executed on worker nodes. The remote functions can in turn execute
+local Python functions. To pass data between driver instance and
+remote functions (or between remote functions), the system uses
+**remote objects**.
+
 To use Ray, you need to understand the following:
 
 - How Ray uses object references to represent immutable remote objects.
@@ -31,40 +42,39 @@ objects and object references, as shown in the example below.
 <ray.ObjRef at 0x1031baef0>
 ```
 
-The command `ray.put(x)` would be run by a worker process or by the driver
-process (the driver process is the one running your script). It takes a Python
-object and copies it to the local object store (here *local* means *on the same
-node*). Once the object has been stored in the object store, its value cannot be
-changed.
-
-In addition, `ray.put(x)` returns an object reference, which is essentially an
-ID that can be used to refer to the newly created remote object. If we save the
-object reference in a variable with `ref = ray.put(x)`, then we can pass `ref`
-into remote functions, and those remote functions will operate on the
-corresponding remote object.
-
-The command `ray.get(ref)` takes an object reference and creates a Python object
-from the corresponding remote object. For some objects like arrays, we can use
-shared memory and avoid copying the object. For other objects, this currently
-copies the object from the object store into the memory of the worker process.
-If the remote object corresponding to the object reference `ref` does not live
-on the same node as the worker that calls `ray.get(ref)`, then the remote object
-will first be copied from an object store that has it to the object store that
-needs it.
-```python
->>> ref = ray.put([1, 2, 3])
->>> ray.get(ref)
-[1, 2, 3]
-```
+The command `ray.put(x)` is an operation that can be invoked by a
+worker or a driver process to "put" the Python object "x" to the local
+object store (here *local* means *on the same node*). Once the object
+has been stored in the object store, its value cannot be changed.
+
+In addition, `ray.put(x)` returns a globally unique object reference,
+which is essentially an ID that can be used to refer to the newly
+created remote object. If we save the object reference in a variable
+with `ref = ray.put(x)`, then we can pass `ref` into remote functions,
+and those remote functions will operate on the corresponding remote
+object.
+
+The command `ray.get(ref)` takes an object reference and creates a
+Python object from the corresponding remote object.  If the remote
+object corresponding to the object reference `ref` does not live on
+the same node as the worker that calls `ray.get(ref)`, then the remote
+object will first be copied from an object store that has it to the
+object store that needs it.  For some objects like arrays, we can use
+the object store and avoid copying the object into Python memory
+heap. For other objects, the object is copied into the memory of the
+worker process.
+
+```python >>> ref = ray.put([1, 2, 3]) >>> ray.get(ref) [1, 2, 3] ```
 
 If the remote object corresponding to the object reference `ref` has not been
 created yet, the command `ray.get(ref)` will wait until the remote object has
 been created.
 
-### Computation graphs in Ray
+### Computation graph in Ray
 
-Ray represents computation with a directed acyclic graph of tasks. Tasks are
-added to this graph by calling **remote functions**.
+A computation is defined by a directed acyclic graph of objects and
+their dependencies. Objects are generated by **tasks**, which are
+added to the system by calling **remote functions**.
 
 For example, a normal Python function looks like this.
 ```python
@@ -91,25 +101,29 @@ actually executed and produced any outputs).
 
 #### Remote functions
 
-Whereas in regular Python, calling `add(1, 2)` would return `3`, in Ray, calling
-`add(1, 2)` does not actually execute the task. Instead, it adds a task to the
-computation graph and immediately returns an object reference to the output of
-the computation.
+Whereas in regular Python, calling `add(1, 2)` would return `3`, in
+Ray, calling `add(1, 2)` does not immediately force any
+execution. Instead, it adds tasks to the computation graph and
+immediately returns an object reference to the output of the remote
+function.
 
 ```python
 >>> ref = add(1, 2)
 >>> ray.get(ref)
 3
 ```
 
-There is a sharp distinction between *submitting a task* and *executing the
-task*. When a remote function is called, the task of executing that function is
-submitted to the scheduler, and the scheduler immediately returns object
-references for the outputs of the task. However, the task will not be executed
-until the scheduler actually schedules the task on a worker.
+There is a sharp distinction between *submitting a remote function*
+and *executing the task* generated by the remote function. When a
+remote function is called, the tasks required to execute the function
+are submitted to the scheduler, and the scheduler immediately returns
+object references for the outputs of the remote function. However, the
+task generating the object reference will not be executed until the
+scheduler actually schedules the task on a worker.
 
-When a task is submitted, each argument may be passed in by value or by object
-reference. For example, these lines have the same behavior.
+When a remote function is submitted, each argument may be passed in by
+value or by object reference. For example, these lines have the same
+behavior.
 
 ```python
 >>> add(1, 2)
@@ -136,14 +150,16 @@ for i in range(10):
   result.append(np.zeros(size=[100, 100]))
 ```
 
-At the core of the above script, there are 10 separate tasks, each of which
-generates a 100x100 matrix of zeros. These tasks do not depend on each other, so
-in principle, they could be executed in parallel. However, in the above
-implementation, they will be executed serially.
+At the core of the above script, there are 10 separate functions, each
+tasked with generating a 100x100 matrix of zeros. These do not depend
+on each other, so in principle, they could be executed in
+parallel. However, in the above implementation, they will be executed
+serially.
 
-Ray gets around this by representing computation as a graph of tasks, where some
-tasks depend on the outputs of other tasks and where tasks can be executed once
-their dependencies have been executed.
+Ray gets around this by allowing the user to define a graph of
+dependencies, where some remote functions depend on the outputs of
+other remote functions. The tasks of a remote functions can be
+executed once their dependencies have been executed.
 
 For example, suppose we define the remote function `zeros` to be a wrapper
 around `np.zeros`.