diff --git a/ideasForNextTime b/ideasForNextTime new file mode 100644 index 0000000..08f0227 --- /dev/null +++ b/ideasForNextTime @@ -0,0 +1 @@ +A participant suggested we warn folks in advance that basic understanding of the shell is useful and point them to training materials. diff --git a/intro.html b/intro.html index f7452a9..d531127 100644 --- a/intro.html +++ b/intro.html @@ -54,7 +54,7 @@

Kunal Mishra and Chris Paciorek

Introduction

We'll do this mostly as a demonstration. We encourage you to login to your account and try out the various examples yourself as we go through them.

Much of this material is based on the extensive Savio documention we have prepared and continue to prepare, available at http://research-it.berkeley.edu/services/high-performance-computing.

-

The materials for this tutorial are available using git at https://github.com/ucberkeley/savio-training-intro-2018 or simply as a zip file.

+

The materials for this tutorial are available using git at the short URL http://bit.do/F18Savio, the GitHub URL https://github.com/ucberkeley/savio-training-intro-2018, or simply as a zip file.

Outline

This training session will cover the following topics:

-

Monitoring jobs and the job queue

+

Monitoring jobs, the job queue, and overall usage

The basic command for seeing what is running on the system is squeue:

squeue
 squeue -u SAVIO_USERNAME
@@ -362,6 +362,8 @@ 

Monitoring jobs and the job queue

For more information on cores, QoS, and additional (e.g., GPU) resources, here's some syntax:

squeue -o "%.7i %.12P %.20j %.8u %.2t %.9M %.5C %.8r %.3D %.20R %.8p %.20q %b" 

We provide some tips about monitoring your jobs. (Scroll down to the "Monitoring jobs" section.)

+

If you'd like to see how much of an FCA has been used:

+
check_usage.sh -a fc_popgen 

Example use of standard software: IPython and R notebooks through JupyterHub

Savio allows one to run Jupyter-based notebooks via a browser-based service called Jupyterhub.

Let's see a brief demo of an IPython notebook:

@@ -502,6 +504,15 @@

Example use of standard software: R< } results

+

Alternative Python Parallelization: Dask

+

In addition to iPyParallel, one of the newer tools in the Python space is Dask, which provides out-of-the-box parallelization more easily without much setup or too much additional. Dask, as a python package, extends Numpy/Pandas syntax for arrays and dataframes that already exists and introduces native parallelization to these data structures, which speeds up analyses. Since Dask dataframes/arrays are descendants of the Pandas dataframe and Numpy array, they are compatible with any existing code and can serve as a plug-in replacement, with performance enhancements for multiple cores/nodes. It's also worth noting that Dask is useful for scaling up to large clusters like Savio but can also be useful for speeding up analyses on your local computer. We're including some articles and documentation that may be helpful in getting started:

+

How to get additional help

-
-

Monitoring jobs and the job queue

+
+

Monitoring jobs, the job queue, and overall usage

The basic command for seeing what is running on the system is squeue:

squeue
 squeue -u SAVIO_USERNAME
@@ -415,6 +415,8 @@ 

Monitoring jobs and the job queue

For more information on cores, QoS, and additional (e.g., GPU) resources, here's some syntax:

squeue -o "%.7i %.12P %.20j %.8u %.2t %.9M %.5C %.8r %.3D %.20R %.8p %.20q %b" 

We provide some tips about monitoring your jobs. (Scroll down to the "Monitoring jobs" section.)

+

If you'd like to see how much of an FCA has been used:

+
check_usage.sh -a fc_popgen 

Example use of standard software: IPython and R notebooks through JupyterHub

@@ -562,6 +564,17 @@

Example use of standard software: R

results
+
+

Alternative Python Parallelization: Dask

+

In addition to iPyParallel, one of the newer tools in the Python space is Dask, which provides out-of-the-box parallelization more easily without much setup or too much additional. Dask, as a python package, extends Numpy/Pandas syntax for arrays and dataframes that already exists and introduces native parallelization to these data structures, which speeds up analyses. Since Dask dataframes/arrays are descendants of the Pandas dataframe and Numpy array, they are compatible with any existing code and can serve as a plug-in replacement, with performance enhancements for multiple cores/nodes. It's also worth noting that Dask is useful for scaling up to large clusters like Savio but can also be useful for speeding up analyses on your local computer. We're including some articles and documentation that may be helpful in getting started:

+ +

How to get additional help