Merge pull request #109 from EPCCed/gpu_faq_branch

AG: added specific FAQ to the GPU Service
EPCCed · Oct 12, 2023 · 9889465 · 9889465
2 parents a1dee3b + 06c1112
commit 9889465
Show file tree

Hide file tree

Showing 2 changed files with 32 additions and 0 deletions.
diff --git a/docs/services/gpuservice/faq.md b/docs/services/gpuservice/faq.md
@@ -0,0 +1,31 @@
+# GPU Service FAQ
+
+## GPU Service Frequently Asked Questions
+
+### How do I access the GPU Service?
+
+The default access route to the GPU Service is via an EIDF DSC VM. The DSC VM will have access to all EIDF resources for your project and can be accessed through the VDI (SSH or if enabled RDP) or via the EIDF SSH Gateway.
+
+### How do I obtain my project kubeconfig file?
+
+Project Leads and Managers can access the kubeconfig file from the Project page in the Portal. Project Leads and Managers can provide the file on any of the project VMs or give it to individuals within the project.
+
+### I can't mount my PVC in multiple containers or pods at the same time
+
+The current PVC provisioner is based on Ceph RBD. The block devices provided by Ceph to the Kubernetes PV/PVC providers cannot be mounted in multiple pods at the same time. They can only be accessed by one pod at a time, once a pod has unmounted the PVC and terminated, the PVC can be reused by another pod. The service development team is working on new PVC provider systems to alleviate this limitation.
+
+### How many GPUs can I use in a pod?
+
+The current limit is 8 GPUs per pod. Each underlying host has 8 GPUs.
+
+### Why did a validation error occur when submitting a pod or job with a valid specification file?
+
+If an error like the below occurs:
+
+```bash
+error: error validating "myjobfile.yml": error validating data: the server does not allow access to the requested resource; if you choose to ignore these errors, turn validation off with --validate=false
+```
+
+There may be an issue with the kubectl version that is being run. This can occur if installing in virtual environments or from packages repositories.
+
+The current version verified to operate with the GPU Service is v1.24.10. kubectl and the Kubernetes API version can suffer from version skew if not with a defined number of releases. More information can be found on this under the [Kubernetes Version Skew Policy](https://kubernetes.io/releases/version-skew-policy/).
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -67,6 +67,7 @@ nav:
          - "Getting Started": services/gpuservice/training/L1_getting_started.md
          - "Persistent Volumes": services/gpuservice/training/L2_requesting_persistent_volumes.md
          - "Running a Pytorch Pod": services/gpuservice/training/L3_running_a_pytorch_task.md
+      - "GPU Service FAQ": services/gpuservice/faq.md
   - "Data Management Services":
     - "Data Catalogue":
       - "Metadata information": services/datacatalogue/metadata.md