Skip to content

Commit

Permalink
Merge pull request #1861 from StatCan/feat-update-docs
Browse files Browse the repository at this point in the history
update all English Documentation
  • Loading branch information
wg102 authored Oct 17, 2023
2 parents 21c1d14 + 9f405ac commit 10df4fa
Show file tree
Hide file tree
Showing 23 changed files with 23 additions and 92 deletions.
6 changes: 3 additions & 3 deletions docs/en/1-Experiments/Jupyter.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ Jupyter comes with a number of features (and we can add more)
[![Explore your data](../images/ExploreData.PNG)](../../2-Publishing/Datasette/)
</center>

Use **[Datasette](~../2-Publishing/Datasette.md/)** , an instant JSON API for your SQLite databases. Run SQL queries in a more interactive way!
Use **[Datasette](../../2-Publishing/Datasette.md/)** , an instant JSON API for your SQLite databases. Run SQL queries in a more interactive way!

### IDE in the browser

Expand Down Expand Up @@ -78,9 +78,9 @@ You can upload and download data to/from JupyterHub directly in the menu. There

### Shareable "Bucket" storage

There is also a mounted `buckets` folder in your home directory, which holds files in [MinIO](../Storage.md/#buckets-via-minio).
There is also a mounted `buckets` folder in your home directory, which holds files in [Azure Blob Storage](../../5-Storage/AzureBlobStorage).

**Refer to the [Storage](../index.md#storage) section for details.**
**Refer to the [Storage](../../5-Storage/Overview) section for details.**

## Data Analysis

Expand Down
4 changes: 2 additions & 2 deletions docs/en/1-Experiments/Remote-Desktop.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ The Ubuntu Virtual Desktop is a powerful tool for data scientists and machine le

Remote Desktop provides an in-browser GUI Ubuntu desktop experience as well as
quick access to supporting tools. The operating system is
[**Ubuntu**](https://ubuntu.com/about) **18.04** with the
[**Ubuntu**](https://ubuntu.com/about) **22.04** with the
[**XFCE**](https://www.xfce.org/about) desktop environment.

![Remote Desktop](../images/rd_desktop.png)
Expand All @@ -32,7 +32,7 @@ _pip_, _conda_, _npm_ and _yarn_ are available to install various packages.
## Accessing the Remote Desktop

To launch the Remote Desktop or any of its supporting tools, create a Notebook
Server in [Kubeflow](./Kubeflow.md) and select the remote desktop option.
Server in [Kubeflow](./Kubeflow.md) and select the remote desktop option, which is the Ubuntu image.

![Remote Desktop](../images/RemoteDesktop.PNG)

Expand Down
6 changes: 3 additions & 3 deletions docs/en/1-Experiments/Selecting-an-Image.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ When selecting an image, you have 3 main options:

- Jupyter Notebook (CPU, TensorFlow, PyTorch)
- RStudio
- Remote Desktop (r, geomatics)
- Remote Desktop

## Jupyter Notebooks

Expand Down Expand Up @@ -48,7 +48,7 @@ experience.

## RStudio

**[RStudio](RStudio/)** gives you an integrated development environment
**[RStudio](../RStudio/)** gives you an integrated development environment
specifically for `R`. If you're coding in `R`, this is typically the Notebook
Server to use. Use the `rstudio` image to get an RStudio environment.

Expand All @@ -59,7 +59,7 @@ Server to use. Use the `rstudio` image to get an RStudio environment.
For a full Ubuntu desktop experience, use the remote desktop image. It comes
pre-loaded with Python, R and Geomatics tooling, but are delivered in a typical
desktop experience that also comes with Firefox, VS Code, and open office tools.
The operating system is **[Ubuntu](https://ubuntu.com/about)** 18.04 with the
The operating system is **[Ubuntu](https://ubuntu.com/about)** 22.04 with the
**[XFCE](https://www.xfce.org/about)** desktop environment.

![Screenshot of the Virtual Desktop](../images/rd_desktop.png)
Empty file.
4 changes: 1 addition & 3 deletions docs/en/2-Publishing/Custom.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,7 @@ container. For instance, Node.js apps, Flask or Dash apps. Etc.

<!-- prettier-ignore -->
!!! info "See the source code for this app"
We just push these kinds of applications through GitHub into the server. The
source for the above app is
[`StatCan/covid19`](https://github.com/StatCan/covid19)
We just push these kinds of applications through GitHub into the server.

# Setup

Expand Down
2 changes: 1 addition & 1 deletion docs/en/2-Publishing/Dash.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ This is an example of a Layout With Figure and Slider from

_Publish with Canadian-made software._

**[Plotly Dash](/2-Publishing/Dash/)** is a popular Python library that allows you to create interactive web-based visualizations and dashboards with ease. Developed by the Montreal-based company Plotly, Dash has gained a reputation for being a powerful and flexible tool for building custom data science graphics. With Dash, you can create everything from simple line charts to complex, multi-page dashboards with interactive widgets and controls. Because it's built on open source technologies like Flask, React, and Plotly.js, Dash is highly customizable and can be easily integrated with other data science tools and workflows. Whether you're a data scientist, analyst, or developer, Dash can help you create engaging and informative visualizations that bring your data to life.
**[Plotly Dash](../Dash/)** is a popular Python library that allows you to create interactive web-based visualizations and dashboards with ease. Developed by the Montreal-based company Plotly, Dash has gained a reputation for being a powerful and flexible tool for building custom data science graphics. With Dash, you can create everything from simple line charts to complex, multi-page dashboards with interactive widgets and controls. Because it's built on open source technologies like Flask, React, and Plotly.js, Dash is highly customizable and can be easily integrated with other data science tools and workflows. Whether you're a data scientist, analyst, or developer, Dash can help you create engaging and informative visualizations that bring your data to life.

# Getting Started

Expand Down
3 changes: 2 additions & 1 deletion docs/en/2-Publishing/Datasette.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,9 +26,10 @@ You can even explore maps within the tool!

![Run SQL Queries](../images/datasette-sql.png)

<!-- Removing until video is approved
# Video Tutorial
[![Click here for the video](../images/KubeflowVideo.PNG)](https://youtu.be/OPVfBKouBT8?t=214 "Advanced Analytics Workspace Kubeflow collaboration demo + tips and tricks")
[![Click here for the video](../images/KubeflowVideo.PNG)](https://youtu.be/OPVfBKouBT8?t=214 "Advanced Analytics Workspace Kubeflow collaboration demo + tips and tricks") -->

# Getting Started

Expand Down
2 changes: 1 addition & 1 deletion docs/en/2-Publishing/PowerBI.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ our Storage system, and use the data as a `pandas` data frame.

1. A computer with Power BI, and Python 3.6
2. Your MinIO `ACCESS_KEY` and `SECRET_KEY` on hand. (See
[Storage](../index.md#storage))
[Storage](../../5-Storage/Overview))

## Set up Power BI

Expand Down
4 changes: 2 additions & 2 deletions docs/en/2-Publishing/R-Shiny.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ R-Shiny is an R package that makes it easy to build interactive web apps in R.

_Publish Professional Quality Graphics_

[![InteractiveDashboard](../images/InteractiveDashboard.PNG)](/2-Publishing/R-Shiny/)
[![InteractiveDashboard](../images/InteractiveDashboard.PNG)](../R-Shiny/)

R Shiny is an open source web application framework that allows data scientists and analysts to create interactive, web-based dashboards and data visualizations using the R programming language. One of the main advantages of R Shiny is that it offers a straightforward way to create high-quality, interactive dashboards without the need for extensive web development expertise. With R Shiny, data scientists can leverage their R coding skills to create dynamic, data-driven web applications that can be shared easily with stakeholders.

Expand Down Expand Up @@ -109,7 +109,7 @@ If you need extra R libraries to be installed, send your list to [the R-Shiny re

<!-- prettier-ignore -->
!!! example "See the above dashboard here"
The above dashboard is in GitHub. Take a look at [the source](https://github.com/StatCan/R-dashboards/tree/master/bus-dashboard), and [see the dashboard live](https://shiny.covid.cloud.statcan.ca/bus-dashboard).
The above dashboard is in GitHub. Take a look at [the source](https://github.com/StatCan/R-dashboards/tree/master/bus-dashboard)).

## Once you've got the basics ...

Expand Down
8 changes: 0 additions & 8 deletions docs/en/3-Pipelines/Argo.md
Original file line number Diff line number Diff line change
Expand Up @@ -374,14 +374,6 @@ Couler provides a simple, unified application programming interface for defining
# Run the workflow
w.create()
```
=== "YAML"
``` py title="workflow.yaml" linenums="1"

```
=== "Seldon?"
``` py

```

### Additional Resources for Argo Workflows

Expand Down
23 changes: 3 additions & 20 deletions docs/en/4-Collaboration/Overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ There are many ways collaborate on the AAW. Which is best for your situation dep
| **Data** | Personal folder or bucket | Team folder or bucket, or shared namespace | Shared Bucket |
| **Compute** | Personal namespace | Shared namespace | N/A |

Sharing code, disks, and workspaces (e.g.: two people sharing the same virtual machine) is described in more detail below. Sharing data through buckets is described in more detail in the **[MinIO](../5-Storage/AzureBlobStorage.md)** section.
Sharing code, disks, and workspaces (e.g.: two people sharing the same virtual machine) is described in more detail below. Sharing data through buckets is described in more detail in the **[Azure Blob Storage](../5-Storage/AzureBlobStorage.md)** section.

<!-- prettier-ignore -->
??? question "What is the difference between a bucket and a folder?"
Expand All @@ -37,7 +37,7 @@ If you need to share code without publishing it on a repository,

<!-- prettier-ignore -->
!!! danger "Sharing a namespace means you share **everything** in the namespace"
Kubeflow does not support granular sharing of one resource (one notebook, one MinIO bucket, etc.), but instead sharing of **all** resources. If you want to share a Jupyter Notebook server with someone, you must share your entire namespace and **they will have access to all other resources (MinIO buckets, etc.)**.
Kubeflow does not support granular sharing of one resource (one notebook, one volume, etc.), but instead sharing of **all** resources. If you want to share a Jupyter Notebook server with someone, you must share your entire namespace and **they will have access to all other resources (Azure Blob Storage, etc.)**.

In Kubeflow every user has a **namespace** that contains their work (their
notebook servers, pipelines, disks, etc.). Your namespace belongs to you, but
Expand All @@ -47,7 +47,7 @@ share with a team). One option for collaboration is to share namespaces with
others.

The advantage of sharing a Kubeflow namespace is that it lets you and your
colleagues share the compute environment and MinIO buckets associated with the
colleagues share the compute environment and volumes associated with the
namespace. This makes it a very easy and free-form way to share.

To share your namespace, see [managing contributors](#managing-contributors)
Expand All @@ -68,30 +68,13 @@ Once you have a shared namespace, you have two shared storage approaches
To learn more about the technology behind these, check out the
[Storage overview](../5-Storage/Overview.md).

### Sharing with StatCan

In addition to private buckets, or team-shared private buckets, you can also
place your files in _shared storage_. Within all bucket storage options
(`minimal`, `premium`, `pachyderm`), you have a private bucket, **and** a folder
inside of the `shared` bucket. Take a look, for instance, at the link below:

- [`shared/blair-drummond/`](https://minimal-tenant1-minio.covid.cloud.statcan.ca/minio/shared/blair-drummond/)

Any **logged in** user can see these files and read them freely.

### Sharing with the world

Ask about that one in our [Slack channel](https://statcan-aaw.slack.com). There
are many ways to do this from the IT side, but it's important for it to go
through proper processes, so this is not done in a "self-serve" way that the
others are. That said, it is totally possible.

## Recommendation: Combine them all

It's a great idea to always use git, and using git along with shared workspaces
is a great way to combine ad hoc sharing (through files) while also keeping your
code organized and tracked.

## Managing contributors

You can add or remove people from a namespace you already own through the
Expand Down
4 changes: 2 additions & 2 deletions docs/en/5-Storage/Disks.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ you from fast solid state drives (SSDs)!
# Setup

When creating your notebook server, you request disks by adding Data Volumes to
your notebook server (pictured below, with `Type = New`). They are automatically
your notebook server (pictured below, with go to `Advanced Options`). They are automatically
mounted at the directory (`Mount Point`) you choose, and serve as a simple and
reliable way to preserve data attached to a Notebook Server.

Expand All @@ -27,7 +27,7 @@ to reuse). If you're done with the disk and it's contents,
## Deleting Disk Storage

To see your disks, check the Notebook Volumes section of the Notebook Server
page (shown below). You can delete any unattached disk (orange icon on the left)
page (shown below). You can delete any unattached disk (icon on the left)
by clicking the trash can icon.

![Delete an unattached volume from the Notebook Server screen](../images/kubeflow_delete_disk.png)
Expand Down
11 changes: 1 addition & 10 deletions docs/en/7-MLOps/Machine-Learning-Model-Cloud-Storage.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,25 +29,16 @@ Overall, cloud storage is a reliable and convenient solution for storing and man
The AAW platform provides several types of storage:

- Disks (also called Volumes on the Kubeflow Notebook Server creation screen)
- Buckets ("Blob" or S3 storage, provided through MinIO)
- Data Lakes (coming soon)

Depending on your use case, either disk or bucket may be most suitable. Our [storage overview](../5-Storage/Overview.md) will help you compare them.

### Disks

[![Disks](../images/Disks.PNG)](Storage.md/)
[![Disks](../images/Disks.PNG)](../5-Storage/Disks.md)

**[Disks](../5-Storage/Disks.md)** are added to your notebook server by adding Data Volumes.

### Buckets

MinIO is an S3-API compatible object storage system that provides an open source alternative to proprietary cloud storage services. While we currently use MinIO as our cloud storage solution, we plan on replacing it with s3-proxy in the near future. S3-proxy is a lightweight, open source reverse proxy server that allows you to access Amazon S3-compatible storage services with your existing applications. By switching to s3-proxy, we will be able to improve our cloud storage performance, security, and scalability, while maintaining compatibility with the S3 API.

[![MinIO](../images/Buckets.PNG)](AzureBlobStorage.md/)

**[MinIO](../5-Storage/AzureBlobStorage.md)** is a cloud-native scalable object store. We use it for buckets (blob or S3 storage).

### Data Lakes (Coming Soon)

A data lake is a central repository that allows you to store all your structured and unstructured data at any scale. It's a cost-effective way to store and manage all types of data, from raw data to processed data, and it's an essential tool for data scientists.
Expand Down
34 changes: 0 additions & 34 deletions docs/en/7-MLOps/Machine-Learning-Training-Pipelines.md
Original file line number Diff line number Diff line change
Expand Up @@ -388,44 +388,10 @@ With the data split, you can now define and train your machine learning model us

After training the model, you need to evaluate its performance on the testing set. This will give you an idea of how well the model will perform on new, unseen data.

=== "Python"
``` py title="evaluate.py" linenums="1"

```
=== "R"
``` r title="evaluate.R" linenums="1"

```
=== "SASPy"
``` py title="evaluate.py" linenums="1"

```
=== "SAS"
``` sas title="evaluate.sas" linenums="1"

```

#### 6. Deploy the model

Finally, you can deploy the trained machine learning model in a production environment.

=== "Python"
``` py title="deploy.py" linenums="1"

```
=== "R"
``` r title="deploy.R" linenums="1"

```
=== "SASPy"
``` py title="deploy.py" linenums="1"

```
=== "SAS"
``` sas title="deploy.sas" linenums="1"

```

### Using Argo Workflows

![Argo Workflows](../images/argo-workflows-assembly-line.jpg)
Expand Down
4 changes: 2 additions & 2 deletions docs/en/Help.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,13 @@ channel. You can ask questions and provide feedback there.

We will also post notices there if there are updates or downtime.

# Video tutorials
<!-- # Video tutorials
After you have joined our Slack community, go and check out the following
tutorials:
- [Platform official](https://www.youtube.com/playlist?list=PL1zlA2D7AHugkDdiyeUHWOKGKUd3MB_nD)
- [Community driven content](https://www.youtube.com/playlist?list=PL1zlA2D7AHuhP0lKbcaD_0KEYUqs1Qrgj)
- [Community driven content](https://www.youtube.com/playlist?list=PL1zlA2D7AHuhP0lKbcaD_0KEYUqs1Qrgj) -->

# GitHub

Expand Down
Binary file modified docs/en/images/RStudioOption.PNG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/en/images/RemoteDesktop.PNG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/en/images/kubeflow_contributors.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/en/images/kubeflow_delete_disk.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/en/images/kubeflow_existing_volume.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/en/images/kubeflow_volumes_disk.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/fr/images/kubeflow_contributors.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/fr/images/kubeflow_delete_disk.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 10df4fa

Please sign in to comment.