Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update GettingStarted.md #257

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 23 additions & 13 deletions GettingStarted/GettingStarted.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,19 @@
_[Greg Madejski](https://github.com/LSSTScienceCollaborations/StackClub/issues/new?body=@Madejski)
and [Phil Marshall](https://github.com/LSSTScienceCollaborations/StackClub/issues/new?body=@drphilmarshall)_

We are developing tutorial notebooks on remote JupyterLab instances, to short-circuit the DM stack installation process and get used to working in the
notebook aspect of the Rubin Science Platform (RSP). In these notes we provide:
* [Notes on how to get set up on the Rubin Science Platform (RSP) JupyterLab Notebook Aspect at the LSST Data Facility at NCSA](https://github.com/LSSTScienceCollaborations/StackClub/blob/master/GettingStarted/GettingStarted.md#accessing-the-lsst-science-platform)
* [Help with getting set up to run and edit the Stack Club tutorial notebooks](https://github.com/LSSTScienceCollaborations/StackClub/blob/master/GettingStarted/GettingStarted.md#running-and-contributing-to-the-stack-club-notebooks)
First, a preamble: The instructions below are for setting up the Rubin account, and connecting to the Rubin Data Facility at NCSA,
used for the Rubin Stack Club members. This is
distinct from the Rubin Data Facility used for the work of Data Preview 0 (DP0) delegates
(using the Rubin DC2 dataset), which is at the Interim Data Facility (IDF). The
instructions for connecting to the Rubin Science Platform as a DP0 delegate, go to
https://dp0-1.lsst.io/dp0-delegate-resources/index.html . We note that it is possible to
access the DC2 data from the Jupyter notebook instance running at NCSA, but this requires one additional step (see below).

The goal of the Stak Club (which predates the DP0) has been to develop tutorial notebooks on remote JupyterLab instances,
to short-circuit the DM stack installation process and get used to working in the notebook aspect of the Rubin Science Platform (RSP).

## Accessing the Rubin Science Platform
The [Rubin Science Platform (RSP) Notebook Aspect Documentation](https://nb.lsst.io/) provides an introduction to the system, including how to gain access and then how to use JupyterLab once you are in.
Access the RSP requires Rubin Observatory data rights, as described at [ls.st/rdo-013](https://ls.st/rdo-013).
The notes here provide an introduction to the system, including how to gain access and then how to use JupyterLab once you are in.
You will also need to get an NCSA account and connect through the NCSA VPN.

#### Getting a Rubin Science Platform Account
Expand All @@ -20,15 +25,11 @@ If your application is successful, you'll get an email with instructions on how
#### Accessing the LSP via its VPN
At present, unless you are on an approved network, you must use the [NCSA virtual private network (VPN)](https://wiki.ncsa.illinois.edu/display/cybersec/Virtual+Private+Network+%28VPN%29+Service).
The recommended method is to use Cisco's AnyConnect with DUO two-factor authentication (verified on Mac and Linux). Detailed instructions are available on the [NCSA VPN site](https://wiki.ncsa.illinois.edu/display/cybersec/Virtual+Private+Network+%28VPN%29+Service#VirtualPrivateNetwork(VPN)Service-UsingtheCiscoAnyConnectVPNClient(Required)).
The best documentation for getting setup with your account is on [nb.lsst.io](https://nb.lsst.io/index.html#getting-started).

1. [Install and configure the NCSA VPN](https://nb.lsst.io/getting-started/logging-in.html#vpn-setup)
2. [Log into the NCSA VPN](https://nb.lsst.io/getting-started/logging-in.html#vpn-login) (**NB:** Use the `ncsa-vpn-default` group; this may not be selected by default)
3. [Log into the Notebook Aspect](https://nb.lsst.io/getting-started/logging-in.html#step-2-log-in) (**NB:** Use "NCSA as the identity provider", not your institution)
> You can get AnyConnect by pointing your browser at https://sslvpn.ncsa.illinois.edu/ and selecting the `ncsa-vpn-default` option (this will only work if you have a java-compatible browser, like firefox esr version<=52). If you already have the AnyConnect client installed, open it up and enter `sslvpn.ncsa.illinois.edu/` in its connection window. (**NB:** Make sure to use the `ncsa-vpn-default` group; this may not be the default group selected)

If you forget your password it can be reset following the instructions [here](https://developer.lsst.io/services/lsst-dev.html?highlight=reset#lsst-dev-password). If you have problems connecting to the NCSA services you can check their status and submit a help ticket [here](https://confluence.lsstcorp.org/display/DM/LSST+Service+Status+page).
> You will need to setup two-factor authentication with DUO. To setup DUO, follow the instructions here https://duo.security.ncsa.illinois.edu/portal. DUO can be configured for smartphone or table access (currently dumb phones are not supported). When AnyConnect asks for your "second password", it wants the 6-digit number in your Duo app. (This may need refreshing, each one can only be used once.) (**NB:** Use "NCSA as the identity provider", not your institution).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://duo.security.ncsa.illinois.edu/portal sends me to a webpage that requests me to login with kerberos.


For a Linux install, you may need to pre-install [`openconnect`](http://www.infradead.org/openconnect/) from your favorite package manager. For Mac OS X, you can also use `openconnect-gui`[https://openconnect.github.io/openconnect-gui/] which can be installed with homebrew.

#### Starting the Rubin Science Platform JupyterLab Notebook Aspect
Once the VPN connection is established, you should be able to navigate to the the JupyterLab instance at **https://lsst-lsp-stable.ncsa.illinois.edu**. Select the `Release` and `medium` options on the Spawner Options landing page, and then hit the "Spawn" button. You'll (eventually) end up on the JupyterLab launcher, where you can use the file manager in the left hand side bar to open your Jupyter notebooks, or start terminal or notebook editor tabs from the buttons provided. You should see the pre-installed `notebook-demo` notebooks in the file manager, for example.
Expand All @@ -37,6 +38,15 @@ Once the VPN connection is established, you should be able to navigate to the th

> At the end of your JupyterLab session, please make sure you save all and log out (from the launcher menu), to free up the cluster for others.

## Accessing the DC2 data set using the Jupyter notebook instance running at NCSA
For Stack Club members using the DC2 data sets at the NCSA, there is an additional step they must do to authorize their account to access the DC2 data. The step is explained in the file /repo/README.md in the NSCA RSP (you can see this README.md in your linux terminal by typing ```cat /repo/README.md``` ). Specifically, the user must create a small file named ```~/.pgpass``` . That file must contain one line:

```
lsst-pg-prod1.ncsa.illinois.edu:5432:lsstdb1:myusername:mysecretpasswd
```
where the ```myusername``` is your NCSA username, and ```mysecretpassword``` is your NCSA password.

It is important to note that the location of the data repository accessible from IDF (``` s3://butler-us-central1-dp01 ```) is not the same as that accessible from NCSA (``` /repo/dc2 ```). Your notebook needs to account for this.

## Running and Contributing to the Stack Club Notebooks
From the Launcher, start a terminal, `cd` to the `notebooks` folder and `git clone` the `StackClub` repo, using either HTTP or SSH access:
Expand All @@ -62,7 +72,7 @@ of the Stack, and flag those that do not run all the way through. The `master` b
> A [template notebook](templates/template_Notebook.ipynb) that will help you maintain the above standards is available in the [templates folder](templates).

#### Available Datasets
Broadly useful, small datasets are available in `/project/shared/data` - this director is world readable, but is only writeable by members of the `lsst-users` group (i.e., Rubin Project members). The stack club has its own read/writeable directory under `/project/stack-club` - feel free to contribute public data there. You can also use your personal `/project/<username>` folder for datasets that you want to share, but may not be as generally applicable. As a rule, Stack Club notebooks should use data in `/project/shared/data` or `/project/stack-club`. If you add a shared dataset, please document it in the `README` of the associated directory.
Broadly useful, small datasets are available in `/project/shared/data` - this directory is world readable, but is only writeable by members of the `lsst-users` group (i.e., Rubin Project members). The stack club has its own read/writeable directory under `/project/stack-club` - feel free to contribute public data there. You can also use your personal `/project/<username>` folder for datasets that you want to share, but may not be as generally applicable. As a rule, Stack Club notebooks should use data in `/project/shared/data` or `/project/stack-club`. If you add a shared dataset, please document it in the `README` of the associated directory.

Larger datasets are available in `/datasets`. This is a read-only folder.

Expand Down