Added GPU enabled sandbox image. (v2?) #4340

danpf · 2023-11-01T03:20:34Z

Preface: Combining work done by @ahlgol and @Future-Outlier with some extra testing/eval/a bunch of nvidia-headache-fixes to get it working fully on ubuntu server. #3256

If ahlgol merges this into the previous PR, this one will close, otherwise we can just use this one (I kept the previous PR's commits)

Setup / testing

0. Prerequisites

Ensure you have installed them and you can run them all

Installing the NVIDIA Container Toolkit:
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html
Nvidia container-toolkit sample-workload:
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/sample-workload.html
Support for Container Device Interface:
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/cdi-support.html
NVIDIA device plugin for Kubernetes (Finish All the Quick Start)
https://github.com/NVIDIA/k8s-device-plugin#quick-start
A public docker server may be necessary (I pushed just to make sure), so login with docker login
general reqs:
- kustomize
- helm
- kubectl
- docker
flyte envd reqs:
- pip install flytekitplugins-envd
- see below: create_envd_context.sh and run that

My env: (may or may not be necessary)

/etc/docker/daemon.json

{
    "default-runtime": "nvidia",
    "runtimes": {
        "nvidia": {
            "args": [],
            "path": "nvidia-container-runtime"
        }
    }
}

docker context list

NAME        DESCRIPTION                               DOCKER ENDPOINT               ERROR
default *   Current DOCKER_HOST based configuration   unix:///var/run/docker.sock

/etc/containerd/config.toml

version = 2

[plugins]

  [plugins."io.containerd.grpc.v1.cri"]

    [plugins."io.containerd.grpc.v1.cri".containerd]

      [plugins."io.containerd.grpc.v1.cri".containerd.runtimes]

        [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia]
          privileged_without_host_devices = false
          runtime_engine = ""
          runtime_root = ""
          runtime_type = "io.containerd.runc.v2"

          [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.nvidia.options]
            BinaryName = "/usr/bin/nvidia-container-runtime"
            SystemdCgroup = true

lsb_release -a

No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 22.04.3 LTS
Release:        22.04
Codename:       jammy

nvidia-smi

Wed Nov  1 03:52:14 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.23.06              Driver Version: 545.23.06    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla T4                       On  | 00000000:00:04.0 Off |                    0 |
| N/A   33C    P8               9W /  70W |      2MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  Tesla T4                       On  | 00000000:00:05.0 Off |                    0 |
| N/A   32C    P8               9W /  70W |      2MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

1. Get branch

Download the branch, build the dockerfile, tag the image, and push it:

git clone https://github.com/danpf/flyte
git checkout danpf-sandbox-gpu
cd flyte/docker/sandbox-bundled
make build-gpu
docker tag flyte-sandbox-gpu:latest dancyrusbio/flyte-sandbox-gpu:latest
docker login
docker push dancyrusbio/flyte-sandbox-gpu:latest

2. Start the cluster

flytectl demo start --image dancyrusbio/flyte-sandbox-gpu:latest --disable-agent --force

3. See if you can use the gpu

$ kubectl describe node | grep -i gpu
  nvidia.com/gpu:     2
  nvidia.com/gpu:     2
  nvidia.com/gpu     0           0

4. run the final job:

create the runme.py script shown below, and then run

pyflyte run --remote runme.py  check_if_gpu_available

Testing scripts

# create_envd_context.sh
envd context create --name flyte-sandbox --builder tcp --builder-address localhost:30003 --use

quickly rebuild and push your docker image (change the name obviously)

# rebuild.sh
make build-gpu && docker tag flyte-sandbox-gpu dancyrusbio/flyte-sandbox-gpu && docker push dancyrusbio/flyte-sandbox-gpu

start a new flyte sandbox cluster

# start_new_flyte_cluster.sh
flytectl demo start --image dancyrusbio/flyte-sandbox-gpu:latest --disable-agent --force

This is the final flyte script to check if your gpu is working

# runme.py
from flytekit import ImageSpec, Resources, task

gpu = "1"

@task(
    retries=2,
    cache=True,
    cache_version="1.0",
    requests=Resources(gpu=gpu),
    environment={"PYTHONPATH": "/root"},
    container_image=ImageSpec(
            cuda="11.8.0",
            python_version="3.9.13",
            packages=["flytekit", "torch"],
            apt_packages=["git"],
            registry="localhost:30000",
    )
)
def check_if_gpu_available() -> bool:
    import torch
    return torch.cuda.is_available()

Proof!

$ kubectl describe node | grep -i gpu
  nvidia.com/gpu:     2
  nvidia.com/gpu:     2
  nvidia.com/gpu     0           0

previous pr

A new Dockerfile and build-target "build-gpu" in docker/sandbox-bundled that builds a CUDA enabled image named flyte-sandbox-gpu.
Describe your changes

Build target added in Makefile for "build-gpu" that builds Dockerfile.gpu
Build target added in Makefile for "manifests-gpu" that adds gpu-operator.yaml to manifests
Dockerfile.gpu is based on existing Dockerfile, but uses a base image from nvidia and installs k3s and crictl and adds containerd config template for nvidia container runtime
Adds bin/k3d-entrypoint-gpu-check.sh that checks if container is started in an nvidia enabled image and exits otherwise.
bin/k3d-entrypoint.sh have been modified to allow for stderr to pass to output, so warning from other entrypoint scripts can be seen (now it will be missing in logfile however)

Check all the applicable boxes

I updated the documentation accordingly.
All new and existing tests passed.

All commits are signed-off.

Note to reviewers

Changes have been added following info from these sources (plus some trial and error):
https://itnext.io/enabling-nvidia-gpus-on-k3s-for-cuda-workloads-a11b96f967b0
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html
https://k3d.io/v5.4.6/usage/advanced/cuda/

Signed-off-by: Future Outlier <[email protected]>

… sandbox-enabled-gpu Signed-off-by: Future Outlier <[email protected]>

Signed-off-by: Future Outlier <[email protected]>

… sandbox-enabled-gpu

Signed-off-by: Danny Farrell <[email protected]>

welcome · 2023-11-01T03:20:37Z

Thank you for opening this pull request! 🙌

These tips will help get your PR across the finish line:

Most of the repos have a PR template; if not, fill it out to the best of your knowledge.
Sign off your commits (Reference: DCO Guide).

Future-Outlier · 2023-11-01T03:35:09Z

Thanks a lot for your help, you and the author of the first PR really make a significant contribution to Flyte.

Future-Outlier · 2023-11-01T11:06:49Z

Hi, thanks a lot for your contributions.
These are really amazing.

Future-Outlier · 2023-11-01T11:14:25Z

Here are some questions!
I believe that if you can provide them, you can help lots of Flyte users to use sandbox GPU image, and also help reviewers to review it more easily.

Should we need to taint the GPU node? why or why not?
Should we need to set the config in flyte sandbox-config? why or why not?
Should we need to Change the k3d-entrypoint-gpu-check Permissions? why or why not?
Is the cuda's version necessary to be the same as your GPU cuda version? Does it have any limit?

Those questions above are related to the 1st GPU PR's discussion here.
#3256 (comment)

docker/sandbox-bundled/bin/k3d-entrypoint-gpu-check.sh

docker/sandbox-bundled/kustomize/gpu-operator.yaml

Future-Outlier · 2023-11-01T11:17:58Z

docker/sandbox-bundled/manifests/complete.yaml

+  namespace: kube-system
+spec:
+  chart: nvidia-device-plugin
+  repo: https://nvidia.github.io/k8s-device-plugin


Suggested change

repo: https://nvidia.github.io/k8s-device-plugin

repo: https://nvidia.github.io/k8s-device-plugin

docker/sandbox-bundled/manifests/complete-agent.yaml

Future-Outlier · 2023-11-01T11:33:26Z

docker/sandbox-bundled/bin/k3d-entrypoint-cgroupv2.sh

  # enable controllers
-  sed -e 's/ / +/g' -e 's/^/+/' <"/sys/fs/cgroup/cgroup.controllers" >"/sys/fs/cgroup/cgroup.subtree_control"
+  sed -e 's/ / +/g' -e 's/^/+/' < /sys/fs/cgroup/cgroup.controllers > /sys/fs/cgroup/cgroup.subtree_control


I guess that GPU sandbox will use this command.
xargs -rn1 < /sys/fs/cgroup/cgroup.procs > /sys/fs/cgroup/init/cgroup.procs || :
Can you explain why GPU sandbox doesn't use busybox?

I know the reason, it is because of the change from here.
https://github.com/moby/moby/blob/ed89041433a031cafc0a0f19cfe573c31688d377/hack/dind#L28-L37

busybox isn't installed on the base image (nvidia/cuda:11.8.0-base-ubuntu22.04) by default. either we install busybox or do a check similar to this

Thanks, I am not sure is it necessary or not.
@jeevb Can you take a look?
Thanks a lot

docker/sandbox-bundled/device-plugin-daemonset.yaml

Future-Outlier · 2023-11-01T11:37:08Z

docker/sandbox-bundled/config.toml.tmpl

+
+{{- if .NodeConfig.AgentConfig.PauseImage }}
+  sandbox_image = "{{ .NodeConfig.AgentConfig.PauseImage }}"
+{{end}}
+
+{{- if .NodeConfig.AgentConfig.Snapshotter }}
+[plugins."io.containerd.grpc.v1.cri".containerd]
+  default_runtime_name = "nvidia"
+  snapshotter = "{{ .NodeConfig.AgentConfig.Snapshotter }}"
+  disable_snapshot_annotations = {{ if eq .NodeConfig.AgentConfig.Snapshotter "stargz" }}false{{else}}true{{end}}
+{{ if eq .NodeConfig.AgentConfig.Snapshotter "stargz" }}
+{{ if .NodeConfig.AgentConfig.ImageServiceSocket }}
+[plugins."io.containerd.snapshotter.v1.stargz"]
+cri_keychain_image_service_path = "{{ .NodeConfig.AgentConfig.ImageServiceSocket }}"
+[plugins."io.containerd.snapshotter.v1.stargz".cri_keychain]
+enable_keychain = true
+{{end}}
+{{ if .PrivateRegistryConfig }}
+{{ if .PrivateRegistryConfig.Mirrors }}
+[plugins."io.containerd.snapshotter.v1.stargz".registry.mirrors]{{end}}
+{{range $k, $v := .PrivateRegistryConfig.Mirrors }}
+[plugins."io.containerd.snapshotter.v1.stargz".registry.mirrors."{{$k}}"]
+  endpoint = [{{range $i, $j := $v.Endpoints}}{{if $i}}, {{end}}{{printf "%q" .}}{{end}}]
+{{if $v.Rewrites}}
+  [plugins."io.containerd.snapshotter.v1.stargz".registry.mirrors."{{$k}}".rewrite]
+{{range $pattern, $replace := $v.Rewrites}}
+    "{{$pattern}}" = "{{$replace}}"
+{{end}}
+{{end}}
+{{end}}
+{{range $k, $v := .PrivateRegistryConfig.Configs }}
+{{ if $v.Auth }}
+[plugins."io.containerd.snapshotter.v1.stargz".registry.configs."{{$k}}".auth]
+  {{ if $v.Auth.Username }}username = {{ printf "%q" $v.Auth.Username }}{{end}}
+  {{ if $v.Auth.Password }}password = {{ printf "%q" $v.Auth.Password }}{{end}}
+  {{ if $v.Auth.Auth }}auth = {{ printf "%q" $v.Auth.Auth }}{{end}}
+  {{ if $v.Auth.IdentityToken }}identitytoken = {{ printf "%q" $v.Auth.IdentityToken }}{{end}}
+{{end}}
+{{ if $v.TLS }}
+[plugins."io.containerd.snapshotter.v1.stargz".registry.configs."{{$k}}".tls]
+  {{ if $v.TLS.CAFile }}ca_file = "{{ $v.TLS.CAFile }}"{{end}}
+  {{ if $v.TLS.CertFile }}cert_file = "{{ $v.TLS.CertFile }}"{{end}}
+  {{ if $v.TLS.KeyFile }}key_file = "{{ $v.TLS.KeyFile }}"{{end}}
+  {{ if $v.TLS.InsecureSkipVerify }}insecure_skip_verify = true{{end}}
+{{end}}
+{{end}}
+{{end}}
+{{end}}
+{{end}}
+
+{{- if not .NodeConfig.NoFlannel }}
+[plugins."io.containerd.grpc.v1.cri".cni]
+  bin_dir = "{{ .NodeConfig.AgentConfig.CNIBinDir }}"
+  conf_dir = "{{ .NodeConfig.AgentConfig.CNIConfDir }}"
+{{end}}
+
+[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
+  runtime_type = "io.containerd.runc.v2"
+
+[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
+  SystemdCgroup = {{ .SystemdCgroup }}
+
+{{ if .PrivateRegistryConfig }}
+{{ if .PrivateRegistryConfig.Mirrors }}
+[plugins."io.containerd.grpc.v1.cri".registry.mirrors]{{end}}
+{{range $k, $v := .PrivateRegistryConfig.Mirrors }}
+[plugins."io.containerd.grpc.v1.cri".registry.mirrors."{{$k}}"]
+  endpoint = [{{range $i, $j := $v.Endpoints}}{{if $i}}, {{end}}{{printf "%q" .}}{{end}}]
+{{if $v.Rewrites}}
+  [plugins."io.containerd.grpc.v1.cri".registry.mirrors."{{$k}}".rewrite]
+{{range $pattern, $replace := $v.Rewrites}}
+    "{{$pattern}}" = "{{$replace}}"
+{{end}}
+{{end}}
+{{end}}
+
+{{range $k, $v := .PrivateRegistryConfig.Configs }}
+{{ if $v.Auth }}
+[plugins."io.containerd.grpc.v1.cri".registry.configs."{{$k}}".auth]
+  {{ if $v.Auth.Username }}username = {{ printf "%q" $v.Auth.Username }}{{end}}
+  {{ if $v.Auth.Password }}password = {{ printf "%q" $v.Auth.Password }}{{end}}
+  {{ if $v.Auth.Auth }}auth = {{ printf "%q" $v.Auth.Auth }}{{end}}
+  {{ if $v.Auth.IdentityToken }}identitytoken = {{ printf "%q" $v.Auth.IdentityToken }}{{end}}
+{{end}}
+{{ if $v.TLS }}
+[plugins."io.containerd.grpc.v1.cri".registry.configs."{{$k}}".tls]
+  {{ if $v.TLS.CAFile }}ca_file = "{{ $v.TLS.CAFile }}"{{end}}
+  {{ if $v.TLS.CertFile }}cert_file = "{{ $v.TLS.CertFile }}"{{end}}
+  {{ if $v.TLS.KeyFile }}key_file = "{{ $v.TLS.KeyFile }}"{{end}}
+  {{ if $v.TLS.InsecureSkipVerify }}insecure_skip_verify = true{{end}}
+{{end}}
+{{end}}
+{{end}}
+
+{{range $k, $v := .ExtraRuntimes}}
+[plugins."io.containerd.grpc.v1.cri".containerd.runtimes."{{$k}}"]
+  runtime_type = "{{$v.RuntimeType}}"
+[plugins."io.containerd.grpc.v1.cri".containerd.runtimes."{{$k}}".options]
+  BinaryName = "{{$v.BinaryName}}"
+{{end}}


Would you like to provide the source URL?
Thanks really much.

docker/sandbox-bundled/bin/k3d-entrypoint-gpu-check.sh

Future-Outlier · 2023-11-01T11:38:58Z

docker/sandbox-bundled/Dockerfile.gpu

+ENV CRI_CONFIG_FILE=/var/lib/rancher/k3s/agent/etc/crictl.yaml
+
+ENTRYPOINT [ "/bin/k3d-entrypoint.sh" ]
+CMD [ "server", "--disable=traefik", "--disable=servicelb" ]


Would you like to explain the logic between Dockerfile and Dockerfile.gpu under the same directory?

docker/sandbox-bundled/device-plugin-daemonset.yaml

Co-authored-by: Future-Outlier <[email protected]> Signed-off-by: Daniel Farrell <[email protected]>

Signed-off-by: Daniel Farrell <[email protected]>

Future-Outlier · 2023-11-06T12:30:59Z

I think after we solve the security issue and remove everything about the gpu operator file, this PR can be merged, thanks for your tons of work.

Co-authored-by: Future-Outlier <[email protected]> Signed-off-by: Daniel Farrell <[email protected]>

Signed-off-by: Danny Farrell <[email protected]>

danpf · 2023-11-08T00:27:04Z

I'm not sure where else to explain this, but to answer any questions about the Dockerfile.gpu vs the Dockerfile file:

Here is a side-by-side diff screenshot of the two files:

The differences between the two files are shown in red. Essentially everything that is added to Dockerfile.gpu is due to the fact that the base image of k3s is scratch and the base image of our cuda is ubuntu. So you need to install a few requirements, install CRICTL, set the kubectl alias, and set some extra volumes/paths (at least according to the various docs)

docker/sandbox-bundled/Makefile

Future-Outlier · 2023-11-08T03:15:50Z

@danpf , it looks good to me, I think after remove these 2 changes, it's time to merge it, thanks a lot.

Co-authored-by: Future-Outlier <[email protected]> Signed-off-by: Daniel Farrell <[email protected]>

danpf · 2023-11-08T05:00:09Z

Do you think we could get anyone to try and follow/install this? does it still work for you on WSL?

Future-Outlier · 2023-11-08T05:35:30Z

@pingsutw will use a EC2 instance to test this

Future-Outlier · 2023-11-09T06:52:21Z

It works on WSL, but WSL has some additional settings, which is complicated for me, in my WSL, I saw all pods about GPU started, so I think it's correct.

granthamtaylor · 2024-02-20T01:54:27Z

Hey folks. I am working on a project that would greatly benefit from being able to have tasks be able to utilize GPUs in Sandbox. What is the current status of this PR?

Future-Outlier · 2024-02-20T01:55:59Z

Hey folks. I am working on a project that would greatly benefit from being able to have tasks be able to utilize GPUs in Sandbox. What is the current status of this PR?

It works, but haven't add tests and not reviewed by other maitainers.

Future-Outlier · 2024-02-20T01:57:04Z

Hey folks. I am working on a project that would greatly benefit from being able to have tasks be able to utilize GPUs in Sandbox. What is the current status of this PR?

You can

cd flyte
gh pr checkout 4340
make build_gpu

to create the image, thank you!

davidmirror-ops · 2024-02-20T15:15:03Z

Do we still need help testing/installing this?
If so, what are the most up-to-date instructions?

danpf · 2024-02-20T15:28:07Z

@davidmirror-ops The current instructions in the OP are up to date (to my knowledge, but it has been some time). We couldn't convince anyone to test/install this. You will need an Nvidia gpu to do so.

granthamtaylor · 2024-02-20T20:50:20Z

I am building a PC to function as private workstation. I will be getting a 4090 in about two weeks. I can test once it is finished.

This contribution is extremely useful for my intent, thank you for developing the feature!

davidmirror-ops · 2024-11-12T16:13:37Z

Hey @granthamtaylor did you have a chance to try this one?

Future Outlier and others added 5 commits October 15, 2023 09:57

first version of sandbox gpu image

395bfb9

Signed-off-by: Future Outlier <[email protected]>

Merge branch 'master' of https://github.com/Future-Outlier/flyte into…

9fd335f

… sandbox-enabled-gpu Signed-off-by: Future Outlier <[email protected]>

yaml

92afb8c

Signed-off-by: Future Outlier <[email protected]>

Merge branch 'master' of https://github.com/Future-Outlier/flyte into…

772e160

… sandbox-enabled-gpu

Make gpu sandbox work on linux

5027f4a

Signed-off-by: Danny Farrell <[email protected]>

danpf marked this pull request as ready for review November 1, 2023 03:53

Future-Outlier requested changes Nov 1, 2023

View reviewed changes

danpf and others added 3 commits November 3, 2023 17:26

Update docker/sandbox-bundled/manifests/complete-agent.yaml

c2eed3e

Co-authored-by: Future-Outlier <[email protected]> Signed-off-by: Daniel Farrell <[email protected]>

Update device-plugin-daemonset.yaml

a4ec221

Signed-off-by: Daniel Farrell <[email protected]>

Delete docker/sandbox-bundled/kustomize/gpu-operator.yaml

b9510e5

Signed-off-by: Daniel Farrell <[email protected]>

danpf and others added 6 commits November 7, 2023 13:31

Update docker/sandbox-bundled/bin/k3d-entrypoint-gpu-check.sh

f8c8c85

Co-authored-by: Future-Outlier <[email protected]> Signed-off-by: Daniel Farrell <[email protected]>

Add note about source of config.toml.tmpl

6ae93f3

Signed-off-by: Danny Farrell <[email protected]>

More documentation of Dockerfile.gpu

91347b1

Signed-off-by: Danny Farrell <[email protected]>

More documentation on dockerfile

c9d639a

Signed-off-by: Danny Farrell <[email protected]>

Merge branch 'master' into danpf-sandbox-gpu

d8b0717

Signed-off-by: Danny Farrell <[email protected]>

Remove nvidia-device-plugin.yml

ccf0505

Signed-off-by: Danny Farrell <[email protected]>

Future-Outlier requested changes Nov 8, 2023

View reviewed changes

docker/sandbox-bundled/Makefile Outdated Show resolved Hide resolved

docker/sandbox-bundled/Makefile Outdated Show resolved Hide resolved

danpf and others added 2 commits November 7, 2023 22:59

Update docker/sandbox-bundled/Makefile

68fd7a2

Co-authored-by: Future-Outlier <[email protected]> Signed-off-by: Daniel Farrell <[email protected]>

Update docker/sandbox-bundled/Makefile

59fd04d

Co-authored-by: Future-Outlier <[email protected]> Signed-off-by: Daniel Farrell <[email protected]>

davidmirror-ops requested a review from pingsutw November 9, 2023 10:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added GPU enabled sandbox image. (v2?) #4340

Added GPU enabled sandbox image. (v2?) #4340

danpf commented Nov 1, 2023 •

edited

Loading

welcome bot commented Nov 1, 2023

Future-Outlier commented Nov 1, 2023

Future-Outlier commented Nov 1, 2023

Future-Outlier commented Nov 1, 2023

Future-Outlier Nov 1, 2023

Future-Outlier Nov 1, 2023

Future-Outlier Nov 1, 2023

danpf Nov 3, 2023

Future-Outlier Nov 6, 2023

Future-Outlier Nov 1, 2023

Future-Outlier Nov 1, 2023

Future-Outlier commented Nov 6, 2023

danpf commented Nov 8, 2023

Future-Outlier commented Nov 8, 2023

danpf commented Nov 8, 2023

Future-Outlier commented Nov 8, 2023

Future-Outlier commented Nov 9, 2023 •

edited

Loading

granthamtaylor commented Feb 20, 2024

Future-Outlier commented Feb 20, 2024

Future-Outlier commented Feb 20, 2024

davidmirror-ops commented Feb 20, 2024

danpf commented Feb 20, 2024

granthamtaylor commented Feb 20, 2024 •

edited

Loading

davidmirror-ops commented Nov 12, 2024

	repo: https://nvidia.github.io/k8s-device-plugin
	repo: https://nvidia.github.io/k8s-device-plugin

Added GPU enabled sandbox image. (v2?) #4340

Are you sure you want to change the base?

Added GPU enabled sandbox image. (v2?) #4340

Conversation

danpf commented Nov 1, 2023 • edited Loading

Setup / testing

0. Prerequisites

1. Get branch

2. Start the cluster

3. See if you can use the gpu

4. run the final job:

Testing scripts

Proof!

previous pr

welcome bot commented Nov 1, 2023

Future-Outlier commented Nov 1, 2023

Future-Outlier commented Nov 1, 2023

Future-Outlier commented Nov 1, 2023

Future-Outlier Nov 1, 2023

Choose a reason for hiding this comment

Future-Outlier Nov 1, 2023

Choose a reason for hiding this comment

Future-Outlier Nov 1, 2023

Choose a reason for hiding this comment

danpf Nov 3, 2023

Choose a reason for hiding this comment

Future-Outlier Nov 6, 2023

Choose a reason for hiding this comment

Future-Outlier Nov 1, 2023

Choose a reason for hiding this comment

Future-Outlier Nov 1, 2023

Choose a reason for hiding this comment

Future-Outlier commented Nov 6, 2023

danpf commented Nov 8, 2023

Future-Outlier commented Nov 8, 2023

danpf commented Nov 8, 2023

Future-Outlier commented Nov 8, 2023

Future-Outlier commented Nov 9, 2023 • edited Loading

granthamtaylor commented Feb 20, 2024

Future-Outlier commented Feb 20, 2024

Future-Outlier commented Feb 20, 2024

davidmirror-ops commented Feb 20, 2024

danpf commented Feb 20, 2024

granthamtaylor commented Feb 20, 2024 • edited Loading

davidmirror-ops commented Nov 12, 2024

danpf commented Nov 1, 2023 •

edited

Loading

Future-Outlier commented Nov 9, 2023 •

edited

Loading

granthamtaylor commented Feb 20, 2024 •

edited

Loading