Added GPU enabled sandbox image. #3256

ahlgol · 2023-01-22T20:49:23Z

A new Dockerfile and build-target "build-gpu" in docker/sandbox-bundled that builds a CUDA enabled image named flyte-sandbox-gpu.

Describe your changes

Build target added in Makefile for "build-gpu" that builds Dockerfile.gpu
Build target added in Makefile for "manifests-gpu" that adds gpu-operator.yaml to manifests
Dockerfile.gpu is based on existing Dockerfile, but uses a base image from nvidia and installs k3s and crictl and adds containerd config template for nvidia container runtime
Adds bin/k3d-entrypoint-gpu-check.sh that checks if container is started in an nvidia enabled image and exits otherwise.
bin/k3d-entrypoint.sh have been modified to allow for stderr to pass to output, so warning from other entrypoint scripts can be seen (now it will be missing in logfile however)

Check all the applicable boxes

I updated the documentation accordingly.
All new and existing tests passed.
All commits are signed-off.

Note to reviewers

Changes have been added following info from these sources (plus some trial and error):
https://itnext.io/enabling-nvidia-gpus-on-k3s-for-cuda-workloads-a11b96f967b0
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html
https://k3d.io/v5.4.6/usage/advanced/cuda/

welcome · 2023-01-22T20:49:25Z

Thank you for opening this pull request! 🙌

These tips will help get your PR across the finish line:

Most of the repos have a PR template; if not, fill it out to the best of your knowledge.
Sign off your commits (Reference: DCO Guide).

jeevb · 2023-01-25T19:40:41Z

This is great. I'm wondering if there is a way to do this is in a more scalable way. Namely, perhaps we can refactor our sandbox image in a way that the community can easily layer new functionality on (e.g. GPUs). That way teams can build their own sandbox images, and run flytectl demo with their custom images. We don't necessarily have to build/push these images within this repo.

ahlgol · 2023-01-26T07:51:06Z

@jeevb Interesting idea! :-) I think there could be some creativity unlocked by such a solution.

On the flip side I think there is a great benefit to having an easy way to start an "official" sandbox/demo cluster for the users that don't have k8s-expertise. The sandbox cluster is really perfect for that! GPU capabilities is (unfortunately) a must for a lot of data science use cases however, so I think that would be a nice addition to the official images.

Nan2018 · 2023-03-15T04:19:38Z

@ahlgol I am working on deploying flyte on prem and this is really of great help.

I was able to build the flyte-sandbox-gpu:latest image with make build-gpu.

However, when I start the sandbox cluster with flytectl demo start --image flyte-sandbox-gpu:latest, the sandbox container immediately exited with code 1. It is the same with docker run flyte-sandbox-gpu:latest.

Am I missing something? What is the correct way to use the gpu image?

ahlgol · 2023-03-15T05:51:49Z

Didn't even know about the --image parameter to flytectl - nice :-). What I did for testing was that I replaced the local image with the one built with gpu support and spun up a cluster with a regular flyte demo start. Let me check that it still works for me or if I get the same error.

ahlgol · 2023-03-15T07:00:22Z

Right; so if I understand this correctly what is currently missing from the gpu demo image is the new bootstrapping functionality @jeevb added in February. I will try to update my PR with this and try it out, but in the meantime you can manually add it to the Dockerfile.gpu with something like this:

@@ -10,7 +10,19 @@ WORKDIR /build
 COPY images/manifest.txt images/preload ./
 RUN --security=insecure ./preload manifest.txt
 
+FROM --platform=${BUILDPLATFORM} golang:1.19-bullseye AS bootstrap
 
+ARG TARGETARCH
+ENV CGO_ENABLED 0
+ENV GOARCH "${TARGETARCH}"
+ENV GOOS linux
+
+WORKDIR /flyteorg/build
+COPY bootstrap/go.mod bootstrap/go.sum ./
+RUN go mod download
+COPY bootstrap/ ./
+RUN --mount=type=cache,target=/root/.cache/go-build --mount=type=cache,target=/root/go/pkg/mod \
+    go build -o dist/flyte-sandbox-bootstrap cmd/bootstrap/main.go
 # syntax=docker/dockerfile:1.4-labs
 
 #Following 
@@ -57,6 +69,8 @@ COPY images/tar/${TARGETARCH}/ /var/lib/rancher/k3s/agent/images/
 COPY manifests/ /var/lib/rancher/k3s/server/manifests-staging/
 COPY bin/ /bin/
 
+COPY --from=bootstrap /flyteorg/build/dist/flyte-sandbox-bootstrap /bin/
+
 VOLUME /var/lib/kubelet
 VOLUME /var/lib/rancher/k3s
 VOLUME /var/lib/cni

I could then deploy it with flytectl demo start --image flyte-sandbox-gpu:latest

I haven't had time to test it out, but the cluster starts up as it should. Please let me know if it works for you.

ahlgol · 2023-03-15T07:36:59Z

@Nan2018 PR updated now.

kumare3 · 2023-03-15T14:33:23Z

This is really cool, how do we get it into the official version.
Problem is testing as our ci infra has no gpus

Nan2018 · 2023-03-15T16:00:44Z

@ahlgol I was able to build gpu image with updated PR. but the the sandbox container still immediately exited with code 1 (same with docker run).

what version of flytectl did you test with? I am on

{
  "App": "flytectl",
  "Build": "29da288",
  "Version": "0.6.34",
  "BuildTime": "2023-03-15 10:45:13.597115631 -0500 CDT m=+0.041086554"
}

ahlgol · 2023-03-15T16:41:21Z

@ahlgol I was able to build gpu image with updated PR. but the the sandbox container still immediately exited with code 1 (same with docker run).

what version of flytectl did you test with? I am on
{
  "App": "flytectl",
  "Build": "29da288",
  "Version": "0.6.34",
  "BuildTime": "2023-03-15 10:45:13.597115631 -0500 CDT m=+0.041086554"
}

Yeah, I have the same one... do you have docker configured with the nvidia container runtime? https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html

Notice the test sudo docker run --rm --runtime=nvidia --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi

I will also do a new test from a clean environment...

ahlgol · 2023-03-15T16:55:43Z

@Nan2018 something you can try is start up the container with bash as an entrypoint:

docker run -it --entrypoint bash flyte-sandbox-gpu:latest and then try running /bin/k3d-entrypoint.sh and checking the log in /var/log/k3d-entrypoints_$(date "+%y%m%d%H%M%S").log

ahlgol · 2023-03-15T16:56:53Z

This is really cool, how do we get it into the official version. Problem is testing as our ci infra has no gpus

That would be awesome, but I don't know... :-(

kumare3 · 2023-03-15T23:18:14Z

ok brainstorming

ahlgol · 2023-03-16T16:14:30Z

The conversation continued on slack, but Just as a reference:

For flytectl demo start --image flyte-sandbox-gpu:latest to work with the gpu demo image, the default runtime for docker need to be set to nvidia.

/etc/docker/daemon.json:

{ 
   "default-runtime": "nvidia", 
   "runtimes": { 
       "nvidia": { 
           "path": "nvidia-container-runtime", 
           "runtimeArgs": [] 
       } 
   } 
}

davidmirror-ops · 2023-07-04T14:40:32Z

What's the status?

ahlgol · 2023-07-13T10:59:47Z

I'm not sure who you're asking @davidmirror-ops :-)

From my side I can't maintain this out of tree, as I'm still not using flyte on a regular basis. From the projects side it seems that it can't be included since there is missing GPUs available in the build/testing environment.

gakumar49606 · 2023-08-08T13:28:30Z

Hi @ahlgol
flytectl demo start --image flyte-sandbox-gpu:latest is also crashing for me. sandbox container still immediately exited with code 1. I think the issue is we need to pass --gpus all to the docker run command. I verified by doing docker run with --gpus all. It's working. It's needed to give docker sandbox container access to gpus.
Can you make a change by adding this flag when we trigger flytectl demo start --image flyte-sandbox-gpu:latest

ahlgol · 2023-08-08T15:30:29Z

@gakumar49606
Ah, right. I don't think we'll see any changes to flytectl until this is official functionality.

However, given the and , it seems that if the nvidia container runtime is the default runtime, adding the environment variable NVIDIA_VISIBLE_DEVICES=all to the container might work as well, and that should bypass flytectl. Do you have time to give it a try?

gakumar49606 · 2023-08-09T07:01:06Z

@ahlgol I found the way out.
NVIDIA_VISIBLE_DEVICES=all needs to be set on the host machine. Setting it on the container doesn't make any difference. Also, along with that default runtime has to be added for docker as below in /etc/docker/daemon.json. Note that, default-runtime is very important. Missing this field would again exit the container immediately.

{
    "default-runtime": "nvidia",
    "runtimes": {
        "nvidia": {
            "args": [],
            "path": "nvidia-container-runtime"
        }
    }
}

With the above changes, flytectl demo start --image <> is working fine.

ahlgol · 2023-08-09T14:49:54Z

Glad it worked, and thanks for telling :-)

Future-Outlier · 2023-10-05T04:31:42Z

@ahlgol How's going on this PR?
If you need help, I can take over it.

ahlgol · 2023-10-05T05:24:45Z

Hello @Future-Outlier!

The main problem with this PR was on the receiving end; as there was no way to test it, they couldn't accept it.

I think it still has a value though, so if you have time to keep it up to date provide support in the chat you're more than welcome to! :-)

Let me know if you need any help.

//Björn

Future-Outlier · 2023-10-05T06:53:15Z

Hello @Future-Outlier!

The main problem with this PR was on the receiving end; as there was no way to test it, they couldn't accept it.

I think it still has a value though, so if you have time to keep it up to date provide support in the chat you're more than welcome to! :-)

Let me know if you need any help.

//Björn

My laptop has GPU, I will try to test it today.
Thanks.

gakumar49606 · 2023-10-05T06:58:26Z

My laptop had a GPU. I tested it sometime back in Aug. It worked just fine and picked up the driver !!!

Future-Outlier · 2023-10-05T12:47:36Z

Future-Outlier · 2023-10-05T12:51:03Z

The "nvidia/cuda" base image should be changed, currently, I am testing it.

Future-Outlier · 2023-10-05T13:15:15Z

I can't use the image with flytectl demo start.

Future-Outlier · 2023-10-05T13:16:10Z

Please update the cuda version.

FROM nvidia/cuda:12.1.1-base-ubuntu20.04

https://github.com/flyteorg/flyte/pull/3256/files#diff-1dc0d7c545a734b964c2af8d171f7aee49155926c85894eedc56740863178d90R29

ahlgol · 2023-10-09T21:20:51Z

@Future-Outlier There you go...

Future-Outlier · 2023-10-12T02:03:20Z

Hi @ahlgol,

I hope this message finds you well. I'm currently focusing on finalizing the PR and could use your assistance with a couple of tasks.

Could you please:

Merge the PR with the latest master branch.
Ensure all commits are signed off.
Your help will allow me to dedicate more time to completing the PR effectively.

Future-Outlier · 2023-10-13T02:56:45Z

@ahlgol Can you join Flyte's community?
I want to pair programming with you to solve this issue in 1 week.
Please join it and direct message me, I am Han-Ru in Slack.

Future-Outlier · 2023-10-14T14:53:22Z

The process to test it

GPU
1.start the sandbox with the gpu image

flytectl demo start --image futureoutlier/flyte-sandbox:gpu-v2 --disable-agent --force

2.set the config in flyte sandbox-config

kubectl edit configmap flyte-sandbox-config -n flyte

plugins:
  k8s:
    resource-tolerations:
      - nvidia.com/gpu:
        - key: "key1"
          operator: "Equal"
          value: "value1"
          effect: "NoSchedule"

kubectl rollout restart deployment flyte-sandbox -n flyte

reference
https://docs.flyte.org/projects/cookbook/en/latest/auto_examples/productionizing/configure_use_gpus.html#configure-gpus

3.run the job

pyflyte run --remote --image pingsutw/flytekit:dbPeB53UK_5Lz_mh7s4CpA..  check_gpu.py  check_if_gpu_available

from flytekit import ImageSpec, Resources, task

gpu = "1"

@task(
    retries=2,
    cache=True,
    cache_version="1.0",
    requests=Resources(gpu=gpu),
    # container_image=ImageSpec(
            # cuda="12.2",
            # python_version="3.9.13",
            # packages=["torch"],
            # apt_packages=["git"],
            # registry="futureoutlier",)       
)
def check_if_gpu_available() -> bool:
    import torch
    return torch.cuda.is_available()

Future-Outlier · 2023-10-14T14:55:33Z

Ensure that your setting for running gpu on kubernetes is correct.

kubectl run nvidia-smi --restart=Never --rm -i --tty --image nvidia/cuda:12.1.1-base-ubuntu20.04 -- nvidia-smi

reference: https://jacobtomlinson.dev/posts/2022/how-to-check-your-nvidia-driver-and-cuda-version-in-kubernetes/

Future-Outlier · 2023-10-30T07:01:02Z

GPU Issue Full Steps Guide (Please change to root user, don't use sudo)

0. Prerequisites

Ensure you have installed them and you can run them all

Installing the NVIDIA Container Toolkit:
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html
Nvidia container-toolkit sample-workload:
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/sample-workload.html
Support for Container Device Interface:
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/cdi-support.html
NVIDIA device plugin for Kubernetes (Finish All the Quick Start)
https://github.com/NVIDIA/k8s-device-plugin#quick-start

1. Ensure that your setting for running GPU on Kubernetes is correct

kubectl run nvidia-smi --restart=Never --rm -i --tty --image nvidia/cuda:12.1.1-base-ubuntu20.04 -- nvidia-smi

2. Build the GPU Image

1. Create a GPU DockerFile, and add relevant change

I will give you 2 options here, apply @ahlgol 's diff, or use mine (already applied)

Algo's PR is here: https://github.com/flyteorg/flyte/pull/3256/files
Mine: https://github.com/Future-Outlier/flyte/tree/sandbox-enabled-gpu

git clone https://github.com/Future-Outlier/flyte
git checkout sandbox-enabled-gpu

2. Change the `k3d-entrypoint-gpu-check` Permissions

cd flyte/
chmod +x ./docker/sandbox-bundled/bin/k3d-entrypoint-gpu-check.sh

3. Build the Image, and push it to your docker registry

for example, my docker registry is futureoutlier

cd flyte/docker/sandbox-bundled
make build-gpu
docker tag flyte-sandbox-gpu:latest futureoutlier/flyte:sandbox-gpu-enabled
docker login
docker push futureoutlier/flyte:sandbox-gpu-enabled

3.Test it

1. start the flyte sandbox cluster with the gpu image

flytectl demo start --image futureoutlier/flyte:sandbox-gpu-enabled --disable-agent --force

2. Check if you need can use the GPU

kubectl describe node | grep -i gpu

if the GPU doesn't exist, maybe the there's something wrong

3. (Optional) Taint the GPU node (I think it is not necessary in sandbox)

Get the node id first

kubectl get node -n flyte

Taint the node

kubectl taint nodes <Node-ID> key1=value1:NoSchedule

4. Set the config in flyte sandbox-config

kubectl edit configmap flyte-sandbox-config -n flyte

plugins:
  k8s:
    resource-tolerations:
      - nvidia.com/gpu:
        - key: "key1"
          operator: "Equal"
          value: "value1"
          effect: "NoSchedule"

kubectl rollout restart deployment flyte-sandbox -n flyte

5. Test it by running this task, or any python code with tensorflow or pytorch package

from flytekit import ImageSpec, Resources, task

gpu = "1"

@task(
    retries=2,
    cache=True,
    cache_version="1.0",
    requests=Resources(gpu=gpu),
    ontainer_image=ImageSpec(
            cuda="12.2",
            python_version="3.9.13",
            packages=["torch"],
            apt_packages=["git"],
            registry="your-docker-registry",)       
)
def check_if_gpu_available() -> bool:
    import torch
    return torch.cuda.is_available()

pip install flytekitplugins-envd

pyflyte run --remote check_gpu.py  check_if_gpu_available

6. Add the succeeded screenshot in the comment

danpf · 2023-11-01T02:45:20Z

PR Updates:
I had to make a lot of changes in my own branch to get this to work on ubuntu, but it seems to work now:

Please merge my changes here: https://github.com/danpf/flyte/tree/danpf-sandbox-gpu into this branch

some teasers:

diff:

diff --git a/docker/sandbox-bundled/Dockerfile.gpu b/docker/sandbox-bundled/Dockerfile.gpu
index 50fe741c..4f083918 100644
--- a/docker/sandbox-bundled/Dockerfile.gpu
+++ b/docker/sandbox-bundled/Dockerfile.gpu
@@ -1,5 +1,6 @@
 # syntax=docker/dockerfile:1.4-labs
 
+###### BUILD FLYTE
 FROM --platform=${BUILDPLATFORM} mgoltzsche/podman:minimal AS builder
 
 ARG TARGETARCH
@@ -23,59 +24,49 @@ RUN go mod download
 COPY bootstrap/ ./
 RUN --mount=type=cache,target=/root/.cache/go-build --mount=type=cache,target=/root/go/pkg/mod \
     go build -o dist/flyte-sandbox-bootstrap cmd/bootstrap/main.go
-# syntax=docker/dockerfile:1.4-labs
-
-#Following 
-FROM nvidia/cuda:12.1.1-base-ubuntu20.04
 
-RUN echo 'debconf debconf/frontend select Noninteractive' | debconf-set-selections
+###### GET K3S
+# ARG K3S_TAG=v1.26.4-k3s1
+FROM rancher/k3s:v1.26.4-k3s1 as k3s
 
-RUN apt-get update && \
-    apt-get -y install gnupg2 curl lsb-release && \
-    apt-get clean
+FROM nvidia/cuda:11.8.0-base-ubuntu22.04
 
-# Install NVIDIA Container Runtime
-RUN curl -s -L https://nvidia.github.io/nvidia-container-runtime/gpgkey | apt-key add -
-RUN curl -s -L https://nvidia.github.io/nvidia-container-runtime/ubuntu20.04/nvidia-container-runtime.list | tee /etc/apt/sources.list.d/nvidia-container-runtime.list
-RUN apt-get update && \
-    apt-get -y install nvidia-docker2 && \
-    apt-get clean
-
-# Install crictl
 ENV CRICTL_VERSION="v1.26.0" 
-RUN curl -L https://github.com/kubernetes-sigs/cri-tools/releases/download/$CRICTL_VERSION/crictl-${CRICTL_VERSION}-linux-amd64.tar.gz --output crictl-${CRICTL_VERSION}-linux-amd64.tar.gz
-RUN tar zxvf crictl-$CRICTL_VERSION-linux-amd64.tar.gz -C /usr/local/bin
-RUN rm -f crictl-$CRICTL_VERSION-linux-amd64.tar.gz
+ENV FLYTE_GPU "ENABLED"
+ARG TARGETARCH
 
-# Install k3s
-RUN curl -s -L https://github.com/k3s-io/k3s/releases/download/v1.24.9+k3s1/k3s > /usr/bin/k3s
-RUN chmod u+x /usr/bin/k3s
-RUN echo "alias kubectl='k3s kubectl'" >> /root/.bashrc
+RUN apt-get update \
+    && apt-get -y install gnupg2 curl nvidia-container-toolkit \
+    && chmod 1777 /tmp \
+    && mkdir -p /var/lib/rancher/k3s/agent/etc/containerd \
+    && mkdir -p /var/lib/rancher/k3s/server/manifests \
+    && curl -L https://github.com/kubernetes-sigs/cri-tools/releases/download/$CRICTL_VERSION/crictl-${CRICTL_VERSION}-linux-amd64.tar.gz --output crictl-${CRICTL_VERSION}-linux-amd64.tar.gz \
+    && tar zxvf crictl-$CRICTL_VERSION-linux-amd64.tar.gz -C /usr/local/bin \
+    && rm -f crictl-$CRICTL_VERSION-linux-amd64.tar.gz \
+    && echo "alias kubectl='k3s kubectl'" >> /root/.bashrc
 
-# Setup containerd for nvidia
-COPY config.toml.tmpl /var/lib/rancher/k3s/agent/etc/containerd/config.toml.tmpl
-ENV CRI_CONFIG_FILE="/var/lib/rancher/k3s/agent/etc/crictl.yaml"
+COPY --from=k3s /bin /bin
+COPY --from=k3s /etc /etc
 
-# ENV that signals this container should have gpu enabled
-ENV FLYTE_GPU "ENABLED"
-
-ARG TARGETARCH
+# Provide custom containerd configuration to configure the nvidia-container-runtime
+COPY config.toml.tmpl /var/lib/rancher/k3s/agent/etc/containerd/config.toml.tmpl
 
-ARG FLYTE_SANDBOX_VERSION
-ENV FLYTE_SANDBOX_VERSION "${FLYTE_SANDBOX_VERSION}"
+# Deploy the nvidia driver plugin on startup
+COPY device-plugin-daemonset.yaml /var/lib/rancher/k3s/server/manifests/nvidia-device-plugin-daemonset.yaml
 
 COPY --from=builder /build/images/ /var/lib/rancher/k3s/agent/images/
+COPY --from=bootstrap /flyteorg/build/dist/flyte-sandbox-bootstrap /bin/
 COPY images/tar/${TARGETARCH}/ /var/lib/rancher/k3s/agent/images/
 COPY manifests/ /var/lib/rancher/k3s/server/manifests-staging/
 COPY bin/ /bin/
 
-COPY --from=bootstrap /flyteorg/build/dist/flyte-sandbox-bootstrap /bin/
-
 VOLUME /var/lib/kubelet
 VOLUME /var/lib/rancher/k3s
 VOLUME /var/lib/cni
 VOLUME /var/log
 
+ENV PATH="$PATH:/bin/aux"
+ENV CRI_CONFIG_FILE=/var/lib/rancher/k3s/agent/etc/crictl.yaml
 
 ENTRYPOINT [ "/bin/k3d-entrypoint.sh" ]
-CMD [ "server", "--disable=traefik", "--disable=servicelb" ]
\ No newline at end of file
+CMD [ "server", "--disable=traefik", "--disable=servicelb" ]
diff --git a/docker/sandbox-bundled/bin/k3d-entrypoint-gpu-check.sh b/docker/sandbox-bundled/bin/k3d-entrypoint-gpu-check.sh
old mode 100644
new mode 100755
diff --git a/docker/sandbox-bundled/config.toml.tmpl b/docker/sandbox-bundled/config.toml.tmpl
index 4d5c7fa4..0208836d 100644
--- a/docker/sandbox-bundled/config.toml.tmpl
+++ b/docker/sandbox-bundled/config.toml.tmpl
@@ -1,12 +1,18 @@
-[plugins.opt]
-  path = "{{ .NodeConfig.Containerd.Opt }}"
+version = 2
 
-[plugins.cri]
+[plugins."io.containerd.internal.v1.opt"]
+  path = "{{ .NodeConfig.Containerd.Opt }}"
+[plugins."io.containerd.grpc.v1.cri"]
   stream_server_address = "127.0.0.1"
   stream_server_port = "10010"
+  enable_selinux = {{ .NodeConfig.SELinux }}
+  enable_unprivileged_ports = {{ .EnableUnprivileged }}
+  enable_unprivileged_icmp = {{ .EnableUnprivileged }}
 
-{{- if .IsRunningInUserNS }}
+{{- if .DisableCgroup}}
   disable_cgroup = true
+{{end}}
+{{- if .IsRunningInUserNS }}
   disable_apparmor = true
   restrict_oom_score_adj = true
 {{end}}
@@ -15,41 +21,98 @@
   sandbox_image = "{{ .NodeConfig.AgentConfig.PauseImage }}"
 {{end}}
 
+{{- if .NodeConfig.AgentConfig.Snapshotter }}
+[plugins."io.containerd.grpc.v1.cri".containerd]
+  default_runtime_name = "nvidia"
+  snapshotter = "{{ .NodeConfig.AgentConfig.Snapshotter }}"
+  disable_snapshot_annotations = {{ if eq .NodeConfig.AgentConfig.Snapshotter "stargz" }}false{{else}}true{{end}}
+{{ if eq .NodeConfig.AgentConfig.Snapshotter "stargz" }}
+{{ if .NodeConfig.AgentConfig.ImageServiceSocket }}
+[plugins."io.containerd.snapshotter.v1.stargz"]
+cri_keychain_image_service_path = "{{ .NodeConfig.AgentConfig.ImageServiceSocket }}"
+[plugins."io.containerd.snapshotter.v1.stargz".cri_keychain]
+enable_keychain = true
+{{end}}
+{{ if .PrivateRegistryConfig }}
+{{ if .PrivateRegistryConfig.Mirrors }}
+[plugins."io.containerd.snapshotter.v1.stargz".registry.mirrors]{{end}}
+{{range $k, $v := .PrivateRegistryConfig.Mirrors }}
+[plugins."io.containerd.snapshotter.v1.stargz".registry.mirrors."{{$k}}"]
+  endpoint = [{{range $i, $j := $v.Endpoints}}{{if $i}}, {{end}}{{printf "%q" .}}{{end}}]
+{{if $v.Rewrites}}
+  [plugins."io.containerd.snapshotter.v1.stargz".registry.mirrors."{{$k}}".rewrite]
+{{range $pattern, $replace := $v.Rewrites}}
+    "{{$pattern}}" = "{{$replace}}"
+{{end}}
+{{end}}
+{{end}}
+{{range $k, $v := .PrivateRegistryConfig.Configs }}
+{{ if $v.Auth }}
+[plugins."io.containerd.snapshotter.v1.stargz".registry.configs."{{$k}}".auth]
+  {{ if $v.Auth.Username }}username = {{ printf "%q" $v.Auth.Username }}{{end}}
+  {{ if $v.Auth.Password }}password = {{ printf "%q" $v.Auth.Password }}{{end}}
+  {{ if $v.Auth.Auth }}auth = {{ printf "%q" $v.Auth.Auth }}{{end}}
+  {{ if $v.Auth.IdentityToken }}identitytoken = {{ printf "%q" $v.Auth.IdentityToken }}{{end}}
+{{end}}
+{{ if $v.TLS }}
+[plugins."io.containerd.snapshotter.v1.stargz".registry.configs."{{$k}}".tls]
+  {{ if $v.TLS.CAFile }}ca_file = "{{ $v.TLS.CAFile }}"{{end}}
+  {{ if $v.TLS.CertFile }}cert_file = "{{ $v.TLS.CertFile }}"{{end}}
+  {{ if $v.TLS.KeyFile }}key_file = "{{ $v.TLS.KeyFile }}"{{end}}
+  {{ if $v.TLS.InsecureSkipVerify }}insecure_skip_verify = true{{end}}
+{{end}}
+{{end}}
+{{end}}
+{{end}}
+{{end}}
+
 {{- if not .NodeConfig.NoFlannel }}
-[plugins.cri.cni]
+[plugins."io.containerd.grpc.v1.cri".cni]
   bin_dir = "{{ .NodeConfig.AgentConfig.CNIBinDir }}"
   conf_dir = "{{ .NodeConfig.AgentConfig.CNIConfDir }}"
 {{end}}
 
-[plugins.cri.containerd.runtimes.runc]
-  # ---- changed from 'io.containerd.runc.v2' for GPU support
-  runtime_type = "io.containerd.runtime.v1.linux"
+[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
+  runtime_type = "io.containerd.runc.v2"
 
-# ---- added for GPU support
-[plugins.linux]
-  runtime = "nvidia-container-runtime"
+[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
+  SystemdCgroup = {{ .SystemdCgroup }}
 
 {{ if .PrivateRegistryConfig }}
 {{ if .PrivateRegistryConfig.Mirrors }}
-[plugins.cri.registry.mirrors]{{end}}
+[plugins."io.containerd.grpc.v1.cri".registry.mirrors]{{end}}
 {{range $k, $v := .PrivateRegistryConfig.Mirrors }}
-[plugins.cri.registry.mirrors."{{$k}}"]
+[plugins."io.containerd.grpc.v1.cri".registry.mirrors."{{$k}}"]
   endpoint = [{{range $i, $j := $v.Endpoints}}{{if $i}}, {{end}}{{printf "%q" .}}{{end}}]
+{{if $v.Rewrites}}
+  [plugins."io.containerd.grpc.v1.cri".registry.mirrors."{{$k}}".rewrite]
+{{range $pattern, $replace := $v.Rewrites}}
+    "{{$pattern}}" = "{{$replace}}"
+{{end}}
+{{end}}
 {{end}}
 
 {{range $k, $v := .PrivateRegistryConfig.Configs }}
 {{ if $v.Auth }}
-[plugins.cri.registry.configs."{{$k}}".auth]
-  {{ if $v.Auth.Username }}username = "{{ $v.Auth.Username }}"{{end}}
-  {{ if $v.Auth.Password }}password = "{{ $v.Auth.Password }}"{{end}}
-  {{ if $v.Auth.Auth }}auth = "{{ $v.Auth.Auth }}"{{end}}
-  {{ if $v.Auth.IdentityToken }}identitytoken = "{{ $v.Auth.IdentityToken }}"{{end}}
+[plugins."io.containerd.grpc.v1.cri".registry.configs."{{$k}}".auth]
+  {{ if $v.Auth.Username }}username = {{ printf "%q" $v.Auth.Username }}{{end}}
+  {{ if $v.Auth.Password }}password = {{ printf "%q" $v.Auth.Password }}{{end}}
+  {{ if $v.Auth.Auth }}auth = {{ printf "%q" $v.Auth.Auth }}{{end}}
+  {{ if $v.Auth.IdentityToken }}identitytoken = {{ printf "%q" $v.Auth.IdentityToken }}{{end}}
 {{end}}
 {{ if $v.TLS }}
-[plugins.cri.registry.configs."{{$k}}".tls]
+[plugins."io.containerd.grpc.v1.cri".registry.configs."{{$k}}".tls]
   {{ if $v.TLS.CAFile }}ca_file = "{{ $v.TLS.CAFile }}"{{end}}
   {{ if $v.TLS.CertFile }}cert_file = "{{ $v.TLS.CertFile }}"{{end}}
   {{ if $v.TLS.KeyFile }}key_file = "{{ $v.TLS.KeyFile }}"{{end}}
+  {{ if $v.TLS.InsecureSkipVerify }}insecure_skip_verify = true{{end}}
 {{end}}
 {{end}}
-{{end}}
\ No newline at end of file
+{{end}}
+
+{{range $k, $v := .ExtraRuntimes}}
+[plugins."io.containerd.grpc.v1.cri".containerd.runtimes."{{$k}}"]
+  runtime_type = "{{$v.RuntimeType}}"
+[plugins."io.containerd.grpc.v1.cri".containerd.runtimes."{{$k}}".options]
+  BinaryName = "{{$v.BinaryName}}"
+{{end}}

Future-Outlier · 2023-11-01T03:00:22Z

PR Updates: I had to make a lot of changes in my own branch to get this to work on ubuntu, but it seems to work now:

Please merge my changes here: https://github.com/danpf/flyte/tree/danpf-sandbox-gpu into this branch

some teasers:

diff:

diff --git a/docker/sandbox-bundled/Dockerfile.gpu b/docker/sandbox-bundled/Dockerfile.gpu
index 50fe741c..4f083918 100644
--- a/docker/sandbox-bundled/Dockerfile.gpu
+++ b/docker/sandbox-bundled/Dockerfile.gpu
@@ -1,5 +1,6 @@
 # syntax=docker/dockerfile:1.4-labs
 
+###### BUILD FLYTE
 FROM --platform=${BUILDPLATFORM} mgoltzsche/podman:minimal AS builder
 
 ARG TARGETARCH
@@ -23,59 +24,49 @@ RUN go mod download
 COPY bootstrap/ ./
 RUN --mount=type=cache,target=/root/.cache/go-build --mount=type=cache,target=/root/go/pkg/mod \
     go build -o dist/flyte-sandbox-bootstrap cmd/bootstrap/main.go
-# syntax=docker/dockerfile:1.4-labs
-
-#Following 
-FROM nvidia/cuda:12.1.1-base-ubuntu20.04
 
-RUN echo 'debconf debconf/frontend select Noninteractive' | debconf-set-selections
+###### GET K3S
+# ARG K3S_TAG=v1.26.4-k3s1
+FROM rancher/k3s:v1.26.4-k3s1 as k3s
 
-RUN apt-get update && \
-    apt-get -y install gnupg2 curl lsb-release && \
-    apt-get clean
+FROM nvidia/cuda:11.8.0-base-ubuntu22.04
 
-# Install NVIDIA Container Runtime
-RUN curl -s -L https://nvidia.github.io/nvidia-container-runtime/gpgkey | apt-key add -
-RUN curl -s -L https://nvidia.github.io/nvidia-container-runtime/ubuntu20.04/nvidia-container-runtime.list | tee /etc/apt/sources.list.d/nvidia-container-runtime.list
-RUN apt-get update && \
-    apt-get -y install nvidia-docker2 && \
-    apt-get clean
-
-# Install crictl
 ENV CRICTL_VERSION="v1.26.0" 
-RUN curl -L https://github.com/kubernetes-sigs/cri-tools/releases/download/$CRICTL_VERSION/crictl-${CRICTL_VERSION}-linux-amd64.tar.gz --output crictl-${CRICTL_VERSION}-linux-amd64.tar.gz
-RUN tar zxvf crictl-$CRICTL_VERSION-linux-amd64.tar.gz -C /usr/local/bin
-RUN rm -f crictl-$CRICTL_VERSION-linux-amd64.tar.gz
+ENV FLYTE_GPU "ENABLED"
+ARG TARGETARCH
 
-# Install k3s
-RUN curl -s -L https://github.com/k3s-io/k3s/releases/download/v1.24.9+k3s1/k3s > /usr/bin/k3s
-RUN chmod u+x /usr/bin/k3s
-RUN echo "alias kubectl='k3s kubectl'" >> /root/.bashrc
+RUN apt-get update \
+    && apt-get -y install gnupg2 curl nvidia-container-toolkit \
+    && chmod 1777 /tmp \
+    && mkdir -p /var/lib/rancher/k3s/agent/etc/containerd \
+    && mkdir -p /var/lib/rancher/k3s/server/manifests \
+    && curl -L https://github.com/kubernetes-sigs/cri-tools/releases/download/$CRICTL_VERSION/crictl-${CRICTL_VERSION}-linux-amd64.tar.gz --output crictl-${CRICTL_VERSION}-linux-amd64.tar.gz \
+    && tar zxvf crictl-$CRICTL_VERSION-linux-amd64.tar.gz -C /usr/local/bin \
+    && rm -f crictl-$CRICTL_VERSION-linux-amd64.tar.gz \
+    && echo "alias kubectl='k3s kubectl'" >> /root/.bashrc
 
-# Setup containerd for nvidia
-COPY config.toml.tmpl /var/lib/rancher/k3s/agent/etc/containerd/config.toml.tmpl
-ENV CRI_CONFIG_FILE="/var/lib/rancher/k3s/agent/etc/crictl.yaml"
+COPY --from=k3s /bin /bin
+COPY --from=k3s /etc /etc
 
-# ENV that signals this container should have gpu enabled
-ENV FLYTE_GPU "ENABLED"
-
-ARG TARGETARCH
+# Provide custom containerd configuration to configure the nvidia-container-runtime
+COPY config.toml.tmpl /var/lib/rancher/k3s/agent/etc/containerd/config.toml.tmpl
 
-ARG FLYTE_SANDBOX_VERSION
-ENV FLYTE_SANDBOX_VERSION "${FLYTE_SANDBOX_VERSION}"
+# Deploy the nvidia driver plugin on startup
+COPY device-plugin-daemonset.yaml /var/lib/rancher/k3s/server/manifests/nvidia-device-plugin-daemonset.yaml
 
 COPY --from=builder /build/images/ /var/lib/rancher/k3s/agent/images/
+COPY --from=bootstrap /flyteorg/build/dist/flyte-sandbox-bootstrap /bin/
 COPY images/tar/${TARGETARCH}/ /var/lib/rancher/k3s/agent/images/
 COPY manifests/ /var/lib/rancher/k3s/server/manifests-staging/
 COPY bin/ /bin/
 
-COPY --from=bootstrap /flyteorg/build/dist/flyte-sandbox-bootstrap /bin/
-
 VOLUME /var/lib/kubelet
 VOLUME /var/lib/rancher/k3s
 VOLUME /var/lib/cni
 VOLUME /var/log
 
+ENV PATH="$PATH:/bin/aux"
+ENV CRI_CONFIG_FILE=/var/lib/rancher/k3s/agent/etc/crictl.yaml
 
 ENTRYPOINT [ "/bin/k3d-entrypoint.sh" ]
-CMD [ "server", "--disable=traefik", "--disable=servicelb" ]
\ No newline at end of file
+CMD [ "server", "--disable=traefik", "--disable=servicelb" ]
diff --git a/docker/sandbox-bundled/bin/k3d-entrypoint-gpu-check.sh b/docker/sandbox-bundled/bin/k3d-entrypoint-gpu-check.sh
old mode 100644
new mode 100755
diff --git a/docker/sandbox-bundled/config.toml.tmpl b/docker/sandbox-bundled/config.toml.tmpl
index 4d5c7fa4..0208836d 100644
--- a/docker/sandbox-bundled/config.toml.tmpl
+++ b/docker/sandbox-bundled/config.toml.tmpl
@@ -1,12 +1,18 @@
-[plugins.opt]
-  path = "{{ .NodeConfig.Containerd.Opt }}"
+version = 2
 
-[plugins.cri]
+[plugins."io.containerd.internal.v1.opt"]
+  path = "{{ .NodeConfig.Containerd.Opt }}"
+[plugins."io.containerd.grpc.v1.cri"]
   stream_server_address = "127.0.0.1"
   stream_server_port = "10010"
+  enable_selinux = {{ .NodeConfig.SELinux }}
+  enable_unprivileged_ports = {{ .EnableUnprivileged }}
+  enable_unprivileged_icmp = {{ .EnableUnprivileged }}
 
-{{- if .IsRunningInUserNS }}
+{{- if .DisableCgroup}}
   disable_cgroup = true
+{{end}}
+{{- if .IsRunningInUserNS }}
   disable_apparmor = true
   restrict_oom_score_adj = true
 {{end}}
@@ -15,41 +21,98 @@
   sandbox_image = "{{ .NodeConfig.AgentConfig.PauseImage }}"
 {{end}}
 
+{{- if .NodeConfig.AgentConfig.Snapshotter }}
+[plugins."io.containerd.grpc.v1.cri".containerd]
+  default_runtime_name = "nvidia"
+  snapshotter = "{{ .NodeConfig.AgentConfig.Snapshotter }}"
+  disable_snapshot_annotations = {{ if eq .NodeConfig.AgentConfig.Snapshotter "stargz" }}false{{else}}true{{end}}
+{{ if eq .NodeConfig.AgentConfig.Snapshotter "stargz" }}
+{{ if .NodeConfig.AgentConfig.ImageServiceSocket }}
+[plugins."io.containerd.snapshotter.v1.stargz"]
+cri_keychain_image_service_path = "{{ .NodeConfig.AgentConfig.ImageServiceSocket }}"
+[plugins."io.containerd.snapshotter.v1.stargz".cri_keychain]
+enable_keychain = true
+{{end}}
+{{ if .PrivateRegistryConfig }}
+{{ if .PrivateRegistryConfig.Mirrors }}
+[plugins."io.containerd.snapshotter.v1.stargz".registry.mirrors]{{end}}
+{{range $k, $v := .PrivateRegistryConfig.Mirrors }}
+[plugins."io.containerd.snapshotter.v1.stargz".registry.mirrors."{{$k}}"]
+  endpoint = [{{range $i, $j := $v.Endpoints}}{{if $i}}, {{end}}{{printf "%q" .}}{{end}}]
+{{if $v.Rewrites}}
+  [plugins."io.containerd.snapshotter.v1.stargz".registry.mirrors."{{$k}}".rewrite]
+{{range $pattern, $replace := $v.Rewrites}}
+    "{{$pattern}}" = "{{$replace}}"
+{{end}}
+{{end}}
+{{end}}
+{{range $k, $v := .PrivateRegistryConfig.Configs }}
+{{ if $v.Auth }}
+[plugins."io.containerd.snapshotter.v1.stargz".registry.configs."{{$k}}".auth]
+  {{ if $v.Auth.Username }}username = {{ printf "%q" $v.Auth.Username }}{{end}}
+  {{ if $v.Auth.Password }}password = {{ printf "%q" $v.Auth.Password }}{{end}}
+  {{ if $v.Auth.Auth }}auth = {{ printf "%q" $v.Auth.Auth }}{{end}}
+  {{ if $v.Auth.IdentityToken }}identitytoken = {{ printf "%q" $v.Auth.IdentityToken }}{{end}}
+{{end}}
+{{ if $v.TLS }}
+[plugins."io.containerd.snapshotter.v1.stargz".registry.configs."{{$k}}".tls]
+  {{ if $v.TLS.CAFile }}ca_file = "{{ $v.TLS.CAFile }}"{{end}}
+  {{ if $v.TLS.CertFile }}cert_file = "{{ $v.TLS.CertFile }}"{{end}}
+  {{ if $v.TLS.KeyFile }}key_file = "{{ $v.TLS.KeyFile }}"{{end}}
+  {{ if $v.TLS.InsecureSkipVerify }}insecure_skip_verify = true{{end}}
+{{end}}
+{{end}}
+{{end}}
+{{end}}
+{{end}}
+
 {{- if not .NodeConfig.NoFlannel }}
-[plugins.cri.cni]
+[plugins."io.containerd.grpc.v1.cri".cni]
   bin_dir = "{{ .NodeConfig.AgentConfig.CNIBinDir }}"
   conf_dir = "{{ .NodeConfig.AgentConfig.CNIConfDir }}"
 {{end}}
 
-[plugins.cri.containerd.runtimes.runc]
-  # ---- changed from 'io.containerd.runc.v2' for GPU support
-  runtime_type = "io.containerd.runtime.v1.linux"
+[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
+  runtime_type = "io.containerd.runc.v2"
 
-# ---- added for GPU support
-[plugins.linux]
-  runtime = "nvidia-container-runtime"
+[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
+  SystemdCgroup = {{ .SystemdCgroup }}
 
 {{ if .PrivateRegistryConfig }}
 {{ if .PrivateRegistryConfig.Mirrors }}
-[plugins.cri.registry.mirrors]{{end}}
+[plugins."io.containerd.grpc.v1.cri".registry.mirrors]{{end}}
 {{range $k, $v := .PrivateRegistryConfig.Mirrors }}
-[plugins.cri.registry.mirrors."{{$k}}"]
+[plugins."io.containerd.grpc.v1.cri".registry.mirrors."{{$k}}"]
   endpoint = [{{range $i, $j := $v.Endpoints}}{{if $i}}, {{end}}{{printf "%q" .}}{{end}}]
+{{if $v.Rewrites}}
+  [plugins."io.containerd.grpc.v1.cri".registry.mirrors."{{$k}}".rewrite]
+{{range $pattern, $replace := $v.Rewrites}}
+    "{{$pattern}}" = "{{$replace}}"
+{{end}}
+{{end}}
 {{end}}
 
 {{range $k, $v := .PrivateRegistryConfig.Configs }}
 {{ if $v.Auth }}
-[plugins.cri.registry.configs."{{$k}}".auth]
-  {{ if $v.Auth.Username }}username = "{{ $v.Auth.Username }}"{{end}}
-  {{ if $v.Auth.Password }}password = "{{ $v.Auth.Password }}"{{end}}
-  {{ if $v.Auth.Auth }}auth = "{{ $v.Auth.Auth }}"{{end}}
-  {{ if $v.Auth.IdentityToken }}identitytoken = "{{ $v.Auth.IdentityToken }}"{{end}}
+[plugins."io.containerd.grpc.v1.cri".registry.configs."{{$k}}".auth]
+  {{ if $v.Auth.Username }}username = {{ printf "%q" $v.Auth.Username }}{{end}}
+  {{ if $v.Auth.Password }}password = {{ printf "%q" $v.Auth.Password }}{{end}}
+  {{ if $v.Auth.Auth }}auth = {{ printf "%q" $v.Auth.Auth }}{{end}}
+  {{ if $v.Auth.IdentityToken }}identitytoken = {{ printf "%q" $v.Auth.IdentityToken }}{{end}}
 {{end}}
 {{ if $v.TLS }}
-[plugins.cri.registry.configs."{{$k}}".tls]
+[plugins."io.containerd.grpc.v1.cri".registry.configs."{{$k}}".tls]
   {{ if $v.TLS.CAFile }}ca_file = "{{ $v.TLS.CAFile }}"{{end}}
   {{ if $v.TLS.CertFile }}cert_file = "{{ $v.TLS.CertFile }}"{{end}}
   {{ if $v.TLS.KeyFile }}key_file = "{{ $v.TLS.KeyFile }}"{{end}}
+  {{ if $v.TLS.InsecureSkipVerify }}insecure_skip_verify = true{{end}}
 {{end}}
 {{end}}
-{{end}}
\ No newline at end of file
+{{end}}
+
+{{range $k, $v := .ExtraRuntimes}}
+[plugins."io.containerd.grpc.v1.cri".containerd.runtimes."{{$k}}"]
+  runtime_type = "{{$v.RuntimeType}}"
+[plugins."io.containerd.grpc.v1.cri".containerd.runtimes."{{$k}}".options]
+  BinaryName = "{{$v.BinaryName}}"
+{{end}}

@ahlgol Can you sign off your all previous commits and merge the diff?

Added GPU enabled sandbox image.

936a7c6

kumare3 requested review from yindia, jeevb and wild-endeavor January 22, 2023 21:29

Merge branch 'flyteorg:master' into master

9dcc25c

Add bootstrapping from sandbox-bundled to Dockerfile.gpu

34f1799

ahlgol force-pushed the master branch from 1a2b3f0 to 34f1799 Compare March 15, 2023 07:42

ahlgol closed this Oct 9, 2023

ahlgol force-pushed the master branch from 34f1799 to 4a780c3 Compare October 9, 2023 20:57

ahlgol added 2 commits October 9, 2023 23:11

Merge with upstream

af690b6

Updated nvidia base image

463a584

ahlgol reopened this Oct 9, 2023

danpf mentioned this pull request Nov 1, 2023

Added GPU enabled sandbox image. (v2?) #4340

Open

github-actions bot added the stale label Jul 29, 2024

Added GPU enabled sandbox image. #3256

Are you sure you want to change the base?

Added GPU enabled sandbox image. #3256

Conversation

ahlgol commented Jan 22, 2023

Describe your changes

Check all the applicable boxes

Note to reviewers

welcome bot commented Jan 22, 2023

jeevb commented Jan 25, 2023 • edited Loading

ahlgol commented Jan 26, 2023

Nan2018 commented Mar 15, 2023

ahlgol commented Mar 15, 2023

ahlgol commented Mar 15, 2023

ahlgol commented Mar 15, 2023

kumare3 commented Mar 15, 2023

Nan2018 commented Mar 15, 2023

ahlgol commented Mar 15, 2023

ahlgol commented Mar 15, 2023

ahlgol commented Mar 15, 2023

kumare3 commented Mar 15, 2023

ahlgol commented Mar 16, 2023

davidmirror-ops commented Jul 4, 2023

ahlgol commented Jul 13, 2023

gakumar49606 commented Aug 8, 2023

ahlgol commented Aug 8, 2023

gakumar49606 commented Aug 9, 2023

ahlgol commented Aug 9, 2023

Future-Outlier commented Oct 5, 2023

ahlgol commented Oct 5, 2023

Future-Outlier commented Oct 5, 2023

gakumar49606 commented Oct 5, 2023

Future-Outlier commented Oct 5, 2023

Future-Outlier commented Oct 5, 2023

Future-Outlier commented Oct 5, 2023

Future-Outlier commented Oct 5, 2023

ahlgol commented Oct 9, 2023

Future-Outlier commented Oct 12, 2023

Future-Outlier commented Oct 13, 2023

Future-Outlier commented Oct 14, 2023 • edited Loading

The process to test it

Future-Outlier commented Oct 14, 2023 • edited Loading

Ensure that your setting for running gpu on kubernetes is correct.

Future-Outlier commented Oct 30, 2023 • edited Loading

GPU Issue Full Steps Guide (Please change to root user, don't use sudo)

0. Prerequisites

1. Ensure that your setting for running GPU on Kubernetes is correct

2. Build the GPU Image

1. Create a GPU DockerFile, and add relevant change

2. Change the k3d-entrypoint-gpu-check Permissions

3. Build the Image, and push it to your docker registry

3.Test it

1. start the flyte sandbox cluster with the gpu image

2. Check if you need can use the GPU

3. (Optional) Taint the GPU node (I think it is not necessary in sandbox)

4. Set the config in flyte sandbox-config

5. Test it by running this task, or any python code with tensorflow or pytorch package

6. Add the succeeded screenshot in the comment

danpf commented Nov 1, 2023 • edited Loading

Future-Outlier commented Nov 1, 2023 • edited Loading

jeevb commented Jan 25, 2023 •

edited

Loading

Future-Outlier commented Oct 14, 2023 •

edited

Loading

Future-Outlier commented Oct 14, 2023 •

edited

Loading

Future-Outlier commented Oct 30, 2023 •

edited

Loading

2. Change the `k3d-entrypoint-gpu-check` Permissions

danpf commented Nov 1, 2023 •

edited

Loading

Future-Outlier commented Nov 1, 2023 •

edited

Loading