Skip to content

Commit

Permalink
Merge pull request #27 from opendatahub-io/v1.7-branch
Browse files Browse the repository at this point in the history
Sync downstream master with upstream default v1.7-branch
  • Loading branch information
harshad16 authored Oct 20, 2023
2 parents 6178a82 + 853eb61 commit 69c9877
Show file tree
Hide file tree
Showing 13 changed files with 285 additions and 14 deletions.
28 changes: 28 additions & 0 deletions .github/workflows/auto-add-issue-to-project.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
name: Auto Add Issues to Tracking boards
on:
issues:
types:
- opened
jobs:
add-to-project:
name: Add issue to projects
runs-on: ubuntu-latest
steps:
- name: Generate github-app token
id: app-token
uses: getsentry/action-github-app-token@v2
with:
app_id: ${{ secrets.DEVOPS_APP_ID }}
private_key: ${{ secrets.DEVOPS_APP_PRIVATE_KEY }}
- uses: actions/[email protected]
with:
project-url: https://github.com/orgs/opendatahub-io/projects/39
github-token: ${{ steps.app-token.outputs.token }}
- uses: actions/[email protected]
with:
project-url: https://github.com/orgs/opendatahub-io/projects/40
github-token: ${{ steps.app-token.outputs.token }}
- uses: actions/[email protected]
with:
project-url: https://github.com/orgs/opendatahub-io/projects/45
github-token: ${{ steps.app-token.outputs.token }}
198 changes: 198 additions & 0 deletions components/base/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,198 @@
# ODH Notebook Controller

The ODH Notebook Controller will watch the **Kubeflow Notebook** custom resource
events to extend the Kubeflow notebook controller behavior with the following
capabilities:

- Openshift ingress controller integration.
- Openshift OAuth sidecar injection.

![ODH Notebook Controller OAuth injection
diagram](../odh-notebook-controller/assets/odh-notebook-controller-oauth-diagram.png)


## Directory Base

This directory base would work as the bridge to include both kubeflow-notebook-controller
and odh-notebook-controller deployment.

## Deployment

Add the following configuration to your `KfDef` object to install the
`odh-notebook-controller` from odh-manifests:

```yaml
...
- kustomizeConfig:
repoRef:
name: manifests
path: odh-notebook-controller
name: odh-notebook-controller
```
## Creating Notebooks
Create a notebook object with the image and other parameters such as the
environment variables, resource limits, tolerations, etc:
```shell
notebook_namespace=$(oc config view --minify -o jsonpath='{..namespace}')
cat <<EOF | oc apply -f -
---
apiVersion: kubeflow.org/v1
kind: Notebook
metadata:
name: minimal-notebook
annotations:
notebooks.opendatahub.io/inject-oauth: "true"
spec:
template:
spec:
containers:
- name: minimal-notebook
image: quay.io/thoth-station/s2i-minimal-notebook:v0.3.0
imagePullPolicy: Always
workingDir: /opt/app-root/src
env:
- name: NOTEBOOK_ARGS
value: |
--ServerApp.port=8888
--ServerApp.token=''
--ServerApp.password=''
--ServerApp.base_url=/notebook/${notebook_namespace}/minimal-notebook
ports:
- name: notebook-port
containerPort: 8888
protocol: TCP
resources:
requests:
cpu: "1"
memory: 1Gi
limits:
cpu: "1"
memory: 1Gi
livenessProbe:
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 1
successThreshold: 1
failureThreshold: 3
httpGet:
scheme: HTTP
path: /notebook/${notebook_namespace}/minimal-notebook/api
port: notebook-port
EOF
```

Open the notebook URL in your browser:

```shell
firefox "$(oc get route thoth-minimal-oauth-notebook -o jsonpath='{.spec.host}')/notebook/${notebook_namespace}/minimal-notebook"
```

Find more examples in the [notebook tests folder](../tests/resources/notebook-controller/).

## Notebook Culling

The notebook controller will scale to zero all the notebooks with last activity
older than the idle time. The controller will set the
`notebooks.kubeflow.org/last-activity` annotation when it detects a kernel with
activity.

To enable this feature, create a configmap with the culling configuration:

- **ENABLE_CULLING**: Enable culling feature (false by default).
- **IDLENESS_CHECK_PERIOD**: Polling frequency to update notebook last activity.
- **CULL_IDLE_TIME**: Maximum time to scale notebook to zero if no activity.

When the controller scales down the notebook pods, it will add the
`kubeflow-resource-stopped` annotation. Remove this annotation to start the
notebook server again.

For example, poll notebooks activity every 5 minutes and shutdown those that
have been in an idle state for more than 60 minutes:

```yaml
cat <<EOF | oc apply -f -
---
apiVersion: v1
kind: ConfigMap
metadata:
name: notebook-controller-culler-config
data:
ENABLE_CULLING: "true"
CULL_IDLE_TIME: "60" # In minutes (1 hour)
IDLENESS_CHECK_PERIOD: "5" # In minutes
EOF
```

Restart the notebook controller deployment to refresh the configuration:

```shell
oc rollout restart deploy/notebook-controller-deployment
```

### Culling endpoint

The notebook controller [polls the
activity](https://github.com/kubeflow/kubeflow/blob/100657e8d1072136adf0a39315498b3d510c7c49/components/notebook-controller/pkg/culler/culler.go#L153-L155)
from a specific path:

```go
url := fmt.Sprintf(
"http://%s.%s.svc.%s/notebook/%s/%s/api/kernels",
nm, ns, domain, ns, nm)
```

Make sure the notebook is exposing the kernels at this path by configuring the
`base_url` parameter:

```shell
jupyter lab ... \
--ServerApp.base_url=/notebook/${nb_namespace}/${nb_name}
```

## Notebook GPU

Install the [NVIDIA GPU Operator](https://github.com/NVIDIA/gpu-operator) in
your cluster.

When the operator is installed, make sure it labeled the nodes in your cluster
with the number of GPUs available, for example:

```shell
$ oc get node ${GPU_NODE_NAME} -o yaml | grep "nvidia.com/gpu.count"
nvidia.com/gpu.count: "1"
```

In the notebook object, add the number of GPUs to use in the
`notebook.spec.template.spec.containers.resources` field:

```yaml
resources:
requests:
nvidia.com/gpu: "1"
limits:
nvidia.com/gpu: "1"
```
Allow the notebook to be scheduled in a GPU node by adding the following
toleration to the `notebook.spec.template.spec.tolerations` field:

```yaml
tolerations:
- effect: NoSchedule
key: nvidia.com/gpu
operator: Exists
```

Finally, create the notebook and wait until it is scheduled in a GPU node.

### Requirements

To update the notebook controller manifests, your environment must have the
following:

- [yq](https://github.com/mikefarah/yq#install) version 4.21.1+.
- [kustomize](https://sigs.k8s.io/kustomize/docs/INSTALL.md) version 3.2.0+

6 changes: 6 additions & 0 deletions components/base/kustomization.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
---
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
bases:
- ../notebook-controller/config/overlays/openshift
- ../odh-notebook-controller/config/base
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,5 @@ kind: CustomResourceDefinition
metadata:
name: notebooks.kubeflow.org
spec:
preserveUnknownFields: false # TODO: Remove in Kubeflow 1.7 release
conversion:
strategy: None
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,5 @@ configMapGenerator:
- name: config
envs:
- params.env
generatorOptions:
disableNameSuffixHash: true
5 changes: 0 additions & 5 deletions components/notebook-controller/config/manager/manager.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -36,11 +36,6 @@ spec:
configMapKeyRef:
name: config
key: ISTIO_GATEWAY
- name: CLUSTER_DOMAIN
valueFrom:
configMapKeyRef:
name: config
key: CLUSTER_DOMAIN
- name: ENABLE_CULLING
valueFrom:
configMapKeyRef:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,15 +3,15 @@ apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- ../../base
namespace: notebook-controller-system
namespace: opendatahub
commonLabels:
app.kubernetes.io/part-of: odh-notebook-controller
component.opendatahub.io/name: kf-notebook-controller
opendatahub.io/component: "true"
images:
- name: public.ecr.aws/j1r0q0g6/notebooks/notebook-controller
- name: docker.io/kubeflownotebookswg/notebook-controller
newName: quay.io/opendatahub/kubeflow-notebook-controller
newTag: latest
newTag: 1.7-9f0db5d
configMapGenerator:
- name: config
behavior: merge
Expand Down
8 changes: 6 additions & 2 deletions components/notebook-controller/config/rbac/kustomization.yaml
Original file line number Diff line number Diff line change
@@ -1,9 +1,13 @@
resources:
- role.yaml
- role_binding.yaml
- leader_election_role.yaml
- leader_election_role_binding.yaml
- user_cluster_roles.yaml

# Uncomment the following lines if we want to enable
# leader election for the controller manager.
# - leader_election_role.yaml
# - leader_election_role_binding.yaml

# Comment the following 3 lines if you want to disable
# the auth proxy (https://github.com/brancz/kube-rbac-proxy)
# which protects your /metrics endpoint.
Expand Down
1 change: 1 addition & 0 deletions components/notebook-controller/config/rbac/role.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ rules:
- get
- list
- watch
- delete
- apiGroups:
- ""
resources:
Expand Down
38 changes: 38 additions & 0 deletions components/notebook-controller/controllers/notebook_controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@ const DefaultContainerPort = 8888
const DefaultServingPort = 80
const AnnotationRewriteURI = "notebooks.kubeflow.org/http-rewrite-uri"
const AnnotationHeadersRequestSet = "notebooks.kubeflow.org/http-headers-request-set"
const AnnotationNotebookRestart = "notebooks.opendatahub.io/notebook-restart"

const PrefixEnvVar = "NB_PREFIX"

Expand Down Expand Up @@ -221,6 +222,43 @@ func (r *NotebookReconciler) Reconcile(ctx context.Context, req ctrl.Request) (c
return ctrl.Result{}, err
}

// Check if annotations for the Notebook instance imply a requirement for notebook restart.
// This annotation is supposed to be added once a ConfigMap used in the notebook is added/updated.

annotations := instance.GetAnnotations()
notebookRestart, ok := annotations[AnnotationNotebookRestart]

if ok && notebookRestart == "true" {

log.Info("Annotation restart-pod is set, working on restarting the pod")

// find the pod associated with the notebook instance
foundPod := &corev1.Pod{}
err = r.Get(ctx, types.NamespacedName{Name: instance.Name + "-0", Namespace: instance.Namespace}, foundPod)
if err != nil && apierrs.IsNotFound(err) {
log.Info(fmt.Sprintf("No Pods are currently running for Notebook Server: %s in namesace: %s.", instance.Name, instance.Namespace))
} else if err != nil {
return ctrl.Result{}, err
}

// delete the pod associated with the notebook instance so it can be restarted
err = r.Delete(context.TODO(), foundPod)
if err != nil {
return ctrl.Result{}, err
} else {
// remove the "notebook-restart" annotation so that no restart is triggered for peristing notebook-restart annotations
delete(annotations, AnnotationNotebookRestart)

log.Info("Pod has been restarted, working on resetting annotations of the Notebook")
instance.SetAnnotations(annotations)

err = r.Update(ctx, instance)
if err != nil {
return ctrl.Result{}, err
}
}
}

return ctrl.Result{}, nil
}

Expand Down
2 changes: 1 addition & 1 deletion components/odh-notebook-controller/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ metadata:
A [mutating webhook](./controllers/notebook_webhook.go) is part of the ODH
notebook controller, it will add the sidecar to the notebook deployment. The
controller will create all the objects needed by the proxy as explained in the
follow diagram:
following diagram:
![ODH Notebook Controller OAuth injection
diagram](./assets/odh-notebook-controller-oauth-diagram.png)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,4 @@ resources:
images:
- name: quay.io/opendatahub/odh-notebook-controller
newName: quay.io/opendatahub/odh-notebook-controller
newTag: latest
newTag: 1.7-9f0db5d
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ bases:
- ../webhook

# Adds namespace to all resources.
namespace: odh-notebook-controller-system
namespace: opendatahub

# Value of this field is prepended to the names of all resources, e.g. a
# deployment named "wordpress" becomes "alices-wordpress". Note that it should
Expand Down

0 comments on commit 69c9877

Please sign in to comment.