Skip to content

Commit

Permalink
Merge pull request #1088 from run-ai/config-hierarchy
Browse files Browse the repository at this point in the history
Config hierarchy
  • Loading branch information
yarongol authored Sep 16, 2024
2 parents 3335335 + 40738b9 commit 4de199a
Show file tree
Hide file tree
Showing 66 changed files with 118 additions and 109 deletions.
2 changes: 1 addition & 1 deletion docs/Researcher/Walkthroughs/quickstart-vscode.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ There are various ways to submit a Workspace:

## Prerequisites

To complete this Quickstart, the [Infrastructure Administrator](../../admin/overview-administrator.md) will need to configure a _wildcard_ certificate to Run:ai as described [here](../../admin/runai-setup/config/allow-external-access-to-containers.md#workspaces-configuration).
To complete this Quickstart, the [Infrastructure Administrator](../../admin/overview-administrator.md) will need to configure a _wildcard_ certificate to Run:ai as described [here](../../admin//config/allow-external-access-to-containers.md#workspaces-configuration).

To complete this Quickstart, the [Platform Administrator](../../platform-admin/overview.md) will need to provide you with:

Expand Down
2 changes: 1 addition & 1 deletion docs/Researcher/Walkthroughs/walkthrough-build-ports.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@

## Exposing a Container Port

There are three ways to expose ports in Kubernetes: _Port Forwarding_, _NodePort_, and _LoadBalancer_. The first two will always work. The other requires a special setup by your administrator. The four methods are explained [here](../../admin/runai-setup/config/allow-external-access-to-containers.md).
There are three ways to expose ports in Kubernetes: _Port Forwarding_, _NodePort_, and _LoadBalancer_. The first two will always work. The other requires a special setup by your administrator. The four methods are explained [here](../../admin//config/allow-external-access-to-containers.md).

The document below provides an example based on Port Forwarding.

Expand Down
2 changes: 1 addition & 1 deletion docs/Researcher/best-practices/researcher-notifications.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ date: 2024-Jul-4

Managing numerous data science workloads requires monitoring various stages, including submission, scheduling, initialization, execution, and completion. Additionally, handling suspensions and failures is crucial for ensuring timely workload completion. Email Notifications address this need by sending alerts for critical workload life cycle changes. This empowers data scientists to take necessary actions and prevent delays.

Once the system administrator [configures the email notifications](../../admin/runai-setup/notifications/notifications.md), users will receive notifications about their jobs that transition from one status to another. In addition, the user will get warning notifications before workload termination due to project-defined timeouts. Details included in the email are:
Once the system administrator [configures the email notifications](../../admin/config/notifications.md), users will receive notifications about their jobs that transition from one status to another. In addition, the user will get warning notifications before workload termination due to project-defined timeouts. Details included in the email are:

* Workload type
* Project and cluster information
Expand Down
4 changes: 2 additions & 2 deletions docs/Researcher/cli-reference/runai-submit-dist-TF.md
Original file line number Diff line number Diff line change
Expand Up @@ -339,12 +339,12 @@ runai submit-dist tf --name distributed-job --workers=2 -g 1 \

#### --node-pools `<string>`

> Instructs the scheduler to run this workload using specific set of nodes which are part of a [Node Pool](../../Researcher/scheduling/the-runai-scheduler.md#). You can specify one or more node pools to form a prioritized list of node pools that the scheduler will use to find one node pool that can provide the workload's specification. To use this feature your Administrator will need to label nodes as explained here: [Limit a Workload to a Specific Node Group](../../admin/researcher-setup/limit-to-node-group.md) or use existing node labels, then create a node-pool and assign the label to the node-pool.
> Instructs the scheduler to run this workload using specific set of nodes which are part of a [Node Pool](../../Researcher/scheduling/the-runai-scheduler.md#). You can specify one or more node pools to form a prioritized list of node pools that the scheduler will use to find one node pool that can provide the workload's specification. To use this feature your Administrator will need to label nodes as explained here: [Limit a Workload to a Specific Node Group](../../admin/config/limit-to-node-group.md) or use existing node labels, then create a node-pool and assign the label to the node-pool.
> This flag can be used in conjunction with node-type and Project-based affinity. In this case, the flag is used to refine the list of allowable node groups set from a node-pool. For more information see: [Working with Projects](../../platform-admin/aiinitiatives/org/projects.md).
#### --node-type `<string>`

> Allows defining specific Nodes (machines) or a group of Nodes on which the workload will run. To use this feature your Administrator will need to label nodes as explained here: [Limit a Workload to a Specific Node Group](../../admin/researcher-setup/limit-to-node-group.md).
> Allows defining specific Nodes (machines) or a group of Nodes on which the workload will run. To use this feature your Administrator will need to label nodes as explained here: [Limit a Workload to a Specific Node Group](../../admin/config/limit-to-node-group.md).
#### --toleration `<string>`

Expand Down
4 changes: 2 additions & 2 deletions docs/Researcher/cli-reference/runai-submit-dist-mpi.md
Original file line number Diff line number Diff line change
Expand Up @@ -340,12 +340,12 @@ You can start an unattended mpi training Job of name dist1, based on Project *te

#### --node-pools `<string>`

> Instructs the scheduler to run this workload using specific set of nodes which are part of a [Node Pool](../../Researcher/scheduling/the-runai-scheduler.md#). You can specify one or more node pools to form a prioritized list of node pools that the scheduler will use to find one node pool that can provide the workload's specification. To use this feature your Administrator will need to label nodes as explained here: [Limit a Workload to a Specific Node Group](../../admin/researcher-setup/limit-to-node-group.md) or use existing node labels, then create a node-pool and assign the label to the node-pool.
> Instructs the scheduler to run this workload using specific set of nodes which are part of a [Node Pool](../../Researcher/scheduling/the-runai-scheduler.md#). You can specify one or more node pools to form a prioritized list of node pools that the scheduler will use to find one node pool that can provide the workload's specification. To use this feature your Administrator will need to label nodes as explained here: [Limit a Workload to a Specific Node Group](../../admin/config/limit-to-node-group.md) or use existing node labels, then create a node-pool and assign the label to the node-pool.
> This flag can be used in conjunction with node-type and Project-based affinity. In this case, the flag is used to refine the list of allowable node groups set from a node-pool. For more information see: [Working with Projects](../../platform-admin/aiinitiatives/org/projects.md).
#### --node-type `<string>`

> Allows defining specific Nodes (machines) or a group of Nodes on which the workload will run. To use this feature your Administrator will need to label nodes as explained here: [Limit a Workload to a Specific Node Group](../../admin/researcher-setup/limit-to-node-group.md).
> Allows defining specific Nodes (machines) or a group of Nodes on which the workload will run. To use this feature your Administrator will need to label nodes as explained here: [Limit a Workload to a Specific Node Group](../../admin/config/limit-to-node-group.md).
#### --toleration `<string>`

Expand Down
4 changes: 2 additions & 2 deletions docs/Researcher/cli-reference/runai-submit-dist-pytorch.md
Original file line number Diff line number Diff line change
Expand Up @@ -346,12 +346,12 @@ runai submit-dist pytorch --name distributed-job --workers=2 -g 1 \

#### --node-pools `<string>`

> Instructs the scheduler to run this workload using specific set of nodes which are part of a [Node Pool](../../Researcher/scheduling/the-runai-scheduler.md#). You can specify one or more node pools to form a prioritized list of node pools that the scheduler will use to find one node pool that can provide the workload's specification. To use this feature your Administrator will need to label nodes as explained here: [Limit a Workload to a Specific Node Group](../../admin/researcher-setup/limit-to-node-group.md) or use existing node labels, then create a node-pool and assign the label to the node-pool.
> Instructs the scheduler to run this workload using specific set of nodes which are part of a [Node Pool](../../Researcher/scheduling/the-runai-scheduler.md#). You can specify one or more node pools to form a prioritized list of node pools that the scheduler will use to find one node pool that can provide the workload's specification. To use this feature your Administrator will need to label nodes as explained here: [Limit a Workload to a Specific Node Group](../../admin/config/limit-to-node-group.md) or use existing node labels, then create a node-pool and assign the label to the node-pool.
> This flag can be used in conjunction with node-type and Project-based affinity. In this case, the flag is used to refine the list of allowable node groups set from a node-pool. For more information see: [Working with Projects](../../platform-admin/aiinitiatives/org/projects.md).
#### --node-type `<string>`

> Allows defining specific Nodes (machines) or a group of Nodes on which the workload will run. To use this feature your Administrator will need to label nodes as explained here: [Limit a Workload to a Specific Node Group](../../admin/researcher-setup/limit-to-node-group.md).
> Allows defining specific Nodes (machines) or a group of Nodes on which the workload will run. To use this feature your Administrator will need to label nodes as explained here: [Limit a Workload to a Specific Node Group](../../admin/config/limit-to-node-group.md).
#### --toleration `<string>`

Expand Down
4 changes: 2 additions & 2 deletions docs/Researcher/cli-reference/runai-submit-dist-xgboost.md
Original file line number Diff line number Diff line change
Expand Up @@ -332,12 +332,12 @@ runai submit-dist xgboost --name distributed-job --workers=2 -g 1 \

#### --node-pools `<string>`

> Instructs the scheduler to run this workload using specific set of nodes which are part of a [Node Pool](../../Researcher/scheduling/the-runai-scheduler.md#). You can specify one or more node pools to form a prioritized list of node pools that the scheduler will use to find one node pool that can provide the workload's specification. To use this feature your Administrator will need to label nodes as explained here: [Limit a Workload to a Specific Node Group](../../admin/researcher-setup/limit-to-node-group.md) or use existing node labels, then create a node-pool and assign the label to the node-pool.
> Instructs the scheduler to run this workload using specific set of nodes which are part of a [Node Pool](../../Researcher/scheduling/the-runai-scheduler.md#). You can specify one or more node pools to form a prioritized list of node pools that the scheduler will use to find one node pool that can provide the workload's specification. To use this feature your Administrator will need to label nodes as explained here: [Limit a Workload to a Specific Node Group](../../admin/config/limit-to-node-group.md) or use existing node labels, then create a node-pool and assign the label to the node-pool.
> This flag can be used in conjunction with node-type and Project-based affinity. In this case, the flag is used to refine the list of allowable node groups set from a node-pool. For more information see: [Working with Projects](../../platform-admin/aiinitiatives/org/projects.md).
#### --node-type `<string>`

> Allows defining specific Nodes (machines) or a group of Nodes on which the workload will run. To use this feature your Administrator will need to label nodes as explained here: [Limit a Workload to a Specific Node Group](../../admin/researcher-setup/limit-to-node-group.md).
> Allows defining specific Nodes (machines) or a group of Nodes on which the workload will run. To use this feature your Administrator will need to label nodes as explained here: [Limit a Workload to a Specific Node Group](../../admin/config/limit-to-node-group.md).
#### --toleration `<string>`

Expand Down
4 changes: 2 additions & 2 deletions docs/Researcher/cli-reference/runai-submit.md
Original file line number Diff line number Diff line change
Expand Up @@ -406,12 +406,12 @@ runai submit --job-name-prefix -i runai.jfrog.io/demo/quickstart -g 1

#### --node-pools `<string>`

> Instructs the scheduler to run this workload using specific set of nodes which are part of a [Node Pool](../../Researcher/scheduling/the-runai-scheduler.md#). You can specify one or more node pools to form a prioritized list of node pools that the scheduler will use to find one node pool that can provide the workload's specification. To use this feature your Administrator will need to label nodes as explained here: [Limit a Workload to a Specific Node Group](../../admin/researcher-setup/limit-to-node-group.md) or use existing node labels, then create a node-pool and assign the label to the node-pool.
> Instructs the scheduler to run this workload using specific set of nodes which are part of a [Node Pool](../../Researcher/scheduling/the-runai-scheduler.md#). You can specify one or more node pools to form a prioritized list of node pools that the scheduler will use to find one node pool that can provide the workload's specification. To use this feature your Administrator will need to label nodes as explained here: [Limit a Workload to a Specific Node Group](../../admin/config/limit-to-node-group.md) or use existing node labels, then create a node-pool and assign the label to the node-pool.
> This flag can be used in conjunction with node-type and Project-based affinity. In this case, the flag is used to refine the list of allowable node groups set from a node-pool. For more information see: [Working with Projects](../../platform-admin/aiinitiatives/org/projects.md).
#### --node-type `<string>`

> Allows defining specific Nodes (machines) or a group of Nodes on which the workload will run. To use this feature your Administrator will need to label nodes as explained here: [Limit a Workload to a Specific Node Group](../../admin/researcher-setup/limit-to-node-group.md).
> Allows defining specific Nodes (machines) or a group of Nodes on which the workload will run. To use this feature your Administrator will need to label nodes as explained here: [Limit a Workload to a Specific Node Group](../../admin/config/limit-to-node-group.md).
#### --toleration `<string>`

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,9 @@ The file `cluster-all.yaml` can be then be reviewed. You can use the internal fi
| Folder | File | Purpose |
|-------------|-------|----------|
| `clusterroles` | `base.yaml` | Mandatory Kubernetes _Cluster Roles_ and _Cluster Role Bindings_ |
| `clusterroles` |`project-controller-ns-creation.yaml` | Automatic Project Creation and Maintenance. Provides Run:ai with the ability to create Kubernetes namespaces when the Run:ai administrator creates new Projects. Can be turned on/off via [flag](../cluster-setup/customize-cluster-install.md) |
| `clusterroles` |`project-controller-rb-creation.yaml` | Automatically assign Users to Projects. Can be turned on/off via [flag](../cluster-setup/customize-cluster-install.md) |
| `clusterroles` | `project-controller-cluster-wide-secrets.yaml` | Allow the propagation of Secrets. See [Secrets in Jobs](../../../platform-admin/workloads/assets/secrets.md). Can be turned on/off via [flag](../cluster-setup/customize-cluster-install.md) |
| `clusterroles` | `project-controller-limit-range.yaml` | Disables the usage of the Kubernetes [Limit Range](https://kubernetes.io/docs/concepts/policy/limit-range/#:~:text=A%20LimitRange%20is%20a%20policy,per%20PersistentVolumeClaim%20in%20a%20namespace){target=_blank} feature. Can be turned on/off via [flag](../cluster-setup/customize-cluster-install.md) |
| `clusterroles` |`project-controller-ns-creation.yaml` | Automatic Project Creation and Maintenance. Provides Run:ai with the ability to create Kubernetes namespaces when the Run:ai administrator creates new Projects. Can be turned on/off via [flag](../runai-setup/cluster-setup/customize-cluster-install.md) |
| `clusterroles` |`project-controller-rb-creation.yaml` | Automatically assign Users to Projects. Can be turned on/off via [flag](../runai-setup/cluster-setup/customize-cluster-install.md) |
| `clusterroles` | `project-controller-cluster-wide-secrets.yaml` | Allow the propagation of Secrets. See [Secrets in Jobs](../../platform-admin/workloads/assets/secrets.md). Can be turned on/off via [flag](../runai-setup/cluster-setup/customize-cluster-install.md) |
| `clusterroles` | `project-controller-limit-range.yaml` | Disables the usage of the Kubernetes [Limit Range](https://kubernetes.io/docs/concepts/policy/limit-range/#:~:text=A%20LimitRange%20is%20a%20policy,per%20PersistentVolumeClaim%20in%20a%20namespace){target=_blank} feature. Can be turned on/off via [flag](../runai-setup/cluster-setup/customize-cluster-install.md) |
| `ocp` | `scc.yaml`| OpenShift-specific Security Contexts |
| `priorityclasses` | 4 files | Folder contains a list of _Priority Classes_ used by Run:ai |
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -23,19 +23,19 @@ See [https://kubernetes.io/docs/concepts/services-networking/service](https://ku

## Workspaces configuration

[Workspaces](../../../Researcher/workloads/workspaces/overview.md) allow the Researcher to build AI models interactively.
[Workspaces](../../Researcher/workloads/workspaces/overview.md) allow the Researcher to build AI models interactively.

Workspaces allow the Researcher to launch tools such as Visual Studio code, TensorFlow, TensorBoard etc. These tools require access to the container. Access is provided via URLs.

Run:ai uses the [Cluster URL](../cluster-setup/cluster-prerequisites.md#domain-name-requirement) provided to dynamically create SSL-secured URLs for researchers’ workspaces in the format of `https://<CLUSTER_URL>/project-name/workspace-name`.
Run:ai uses the [Cluster URL](../runai-setup/cluster-setup/cluster-prerequisites.md#domain-name-requirement) provided to dynamically create SSL-secured URLs for researchers’ workspaces in the format of `https://<CLUSTER_URL>/project-name/workspace-name`.

While this form of path-based routing conveniently works with applications like Jupyter Notebooks, it may often not be compatible with other applications. These applications assume running at the root file system, so hardcoded file paths and settings within the container may become invalid when running at a path other than the root. For instance, if the container is expecting to find a file at `/etc/config.json` but is running at `/project-name/workspace-name`, the file will not be found. This can cause the container to fail or not function as intended.

To address this issue, Run:ai provides support for __host-based routing__. When enabled, Run:ai creates workspace URLs in a subdomain format (`https://project-name-workspace-name.<CLUSTER_URL>/`), which allows all workspaces to run at the root path and function properly.

To enable host-based routing you must perform the following steps:

1. Create a second DNS entry (A record) for `*.<CLUSTER_URL>`, pointing to the same IP as the cluster [Fully Qualified Domain Name (FQDN)](../cluster-setup/cluster-prerequisites.md#fully-qualified-domain-name-fqdn)
1. Create a second DNS entry (A record) for `*.<CLUSTER_URL>`, pointing to the same IP as the cluster [Fully Qualified Domain Name (FQDN)](../runai-setup/cluster-setup/cluster-prerequisites.md#fully-qualified-domain-name-fqdn)
2. Obtain a __wildcard__ SSL certificate for this DNS.


Expand Down Expand Up @@ -77,5 +77,5 @@ Once these requirements have been met, all workspaces will automatically be assi

## See Also

* To learn how to use port forwarding see the Quickstart document: [Launch an Interactive Build Workload with Connected Ports](../../../Researcher/Walkthroughs/walkthrough-build-ports.md).
* See CLI command [runai submit](../../../Researcher/cli-reference/runai-submit.md).
* To learn how to use port forwarding see the Quickstart document: [Launch an Interactive Build Workload with Connected Ports](../../Researcher/Walkthroughs/walkthrough-build-ports.md).
* See CLI command [runai submit](../../Researcher/cli-reference/runai-submit.md).
File renamed without changes.
File renamed without changes.
Loading

0 comments on commit 4de199a

Please sign in to comment.