Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

edited basic usage tutorial #244

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
108 changes: 62 additions & 46 deletions doc/basic-usage.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,16 @@
## Basic Usage

The Ray system consists of a driver instance that takes commands from
a Python prompt, and one or more worker nodes. Each worker node has an
unique object store and multiple worker processers (typically one for
each core). The Python prompt takes regular Python commands and
executes them on the driver instance. The Ray system augments the
prompt with an interface that takes **remote functions** which are
executed on worker nodes. The remote functions can in turn execute
local Python functions. To pass data between driver instance and
remote functions (or between remote functions), the system uses
**remote objects**.

To use Ray, you need to understand the following:

- How Ray uses object references to represent immutable remote objects.
Expand Down Expand Up @@ -31,40 +42,39 @@ objects and object references, as shown in the example below.
<ray.ObjRef at 0x1031baef0>
```

The command `ray.put(x)` would be run by a worker process or by the driver
process (the driver process is the one running your script). It takes a Python
object and copies it to the local object store (here *local* means *on the same
node*). Once the object has been stored in the object store, its value cannot be
changed.

In addition, `ray.put(x)` returns an object reference, which is essentially an
ID that can be used to refer to the newly created remote object. If we save the
object reference in a variable with `ref = ray.put(x)`, then we can pass `ref`
into remote functions, and those remote functions will operate on the
corresponding remote object.

The command `ray.get(ref)` takes an object reference and creates a Python object
from the corresponding remote object. For some objects like arrays, we can use
shared memory and avoid copying the object. For other objects, this currently
copies the object from the object store into the memory of the worker process.
If the remote object corresponding to the object reference `ref` does not live
on the same node as the worker that calls `ray.get(ref)`, then the remote object
will first be copied from an object store that has it to the object store that
needs it.
```python
>>> ref = ray.put([1, 2, 3])
>>> ray.get(ref)
[1, 2, 3]
```
The command `ray.put(x)` is an operation that can be invoked by a
worker or a driver process to "put" the Python object "x" to the local
object store (here *local* means *on the same node*). Once the object
has been stored in the object store, its value cannot be changed.

In addition, `ray.put(x)` returns a globally unique object reference,
which is essentially an ID that can be used to refer to the newly
created remote object. If we save the object reference in a variable
with `ref = ray.put(x)`, then we can pass `ref` into remote functions,
and those remote functions will operate on the corresponding remote
object.

The command `ray.get(ref)` takes an object reference and creates a
Python object from the corresponding remote object. If the remote
object corresponding to the object reference `ref` does not live on
the same node as the worker that calls `ray.get(ref)`, then the remote
object will first be copied from an object store that has it to the
object store that needs it. For some objects like arrays, we can use
the object store and avoid copying the object into Python memory
heap. For other objects, the object is copied into the memory of the
worker process.

```python >>> ref = ray.put([1, 2, 3]) >>> ray.get(ref) [1, 2, 3] ```

If the remote object corresponding to the object reference `ref` has not been
created yet, the command `ray.get(ref)` will wait until the remote object has
been created.

### Computation graphs in Ray
### Computation graph in Ray

Ray represents computation with a directed acyclic graph of tasks. Tasks are
added to this graph by calling **remote functions**.
A computation is defined by a directed acyclic graph of objects and
their dependencies. Objects are generated by **tasks**, which are
added to the system by calling **remote functions**.

For example, a normal Python function looks like this.
```python
Expand All @@ -91,25 +101,29 @@ actually executed and produced any outputs).

#### Remote functions

Whereas in regular Python, calling `add(1, 2)` would return `3`, in Ray, calling
`add(1, 2)` does not actually execute the task. Instead, it adds a task to the
computation graph and immediately returns an object reference to the output of
the computation.
Whereas in regular Python, calling `add(1, 2)` would return `3`, in
Ray, calling `add(1, 2)` does not immediately force any
execution. Instead, it adds tasks to the computation graph and
immediately returns an object reference to the output of the remote
function.

```python
>>> ref = add(1, 2)
>>> ray.get(ref)
3
```

There is a sharp distinction between *submitting a task* and *executing the
task*. When a remote function is called, the task of executing that function is
submitted to the scheduler, and the scheduler immediately returns object
references for the outputs of the task. However, the task will not be executed
until the scheduler actually schedules the task on a worker.
There is a sharp distinction between *submitting a remote function*
and *executing the task* generated by the remote function. When a
remote function is called, the tasks required to execute the function
are submitted to the scheduler, and the scheduler immediately returns
object references for the outputs of the remote function. However, the
task generating the object reference will not be executed until the
scheduler actually schedules the task on a worker.

When a task is submitted, each argument may be passed in by value or by object
reference. For example, these lines have the same behavior.
When a remote function is submitted, each argument may be passed in by
value or by object reference. For example, these lines have the same
behavior.

```python
>>> add(1, 2)
Expand All @@ -136,14 +150,16 @@ for i in range(10):
result.append(np.zeros(size=[100, 100]))
```

At the core of the above script, there are 10 separate tasks, each of which
generates a 100x100 matrix of zeros. These tasks do not depend on each other, so
in principle, they could be executed in parallel. However, in the above
implementation, they will be executed serially.
At the core of the above script, there are 10 separate functions, each
tasked with generating a 100x100 matrix of zeros. These do not depend
on each other, so in principle, they could be executed in
parallel. However, in the above implementation, they will be executed
serially.

Ray gets around this by representing computation as a graph of tasks, where some
tasks depend on the outputs of other tasks and where tasks can be executed once
their dependencies have been executed.
Ray gets around this by allowing the user to define a graph of
dependencies, where some remote functions depend on the outputs of
other remote functions. The tasks of a remote functions can be
executed once their dependencies have been executed.

For example, suppose we define the remote function `zeros` to be a wrapper
around `np.zeros`.
Expand Down