-
Notifications
You must be signed in to change notification settings - Fork 22
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #109 from EPCCed/gpu_faq_branch
AG: added specific FAQ to the GPU Service
- Loading branch information
Showing
2 changed files
with
32 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
# GPU Service FAQ | ||
|
||
## GPU Service Frequently Asked Questions | ||
|
||
### How do I access the GPU Service? | ||
|
||
The default access route to the GPU Service is via an EIDF DSC VM. The DSC VM will have access to all EIDF resources for your project and can be accessed through the VDI (SSH or if enabled RDP) or via the EIDF SSH Gateway. | ||
|
||
### How do I obtain my project kubeconfig file? | ||
|
||
Project Leads and Managers can access the kubeconfig file from the Project page in the Portal. Project Leads and Managers can provide the file on any of the project VMs or give it to individuals within the project. | ||
|
||
### I can't mount my PVC in multiple containers or pods at the same time | ||
|
||
The current PVC provisioner is based on Ceph RBD. The block devices provided by Ceph to the Kubernetes PV/PVC providers cannot be mounted in multiple pods at the same time. They can only be accessed by one pod at a time, once a pod has unmounted the PVC and terminated, the PVC can be reused by another pod. The service development team is working on new PVC provider systems to alleviate this limitation. | ||
|
||
### How many GPUs can I use in a pod? | ||
|
||
The current limit is 8 GPUs per pod. Each underlying host has 8 GPUs. | ||
|
||
### Why did a validation error occur when submitting a pod or job with a valid specification file? | ||
|
||
If an error like the below occurs: | ||
|
||
```bash | ||
error: error validating "myjobfile.yml": error validating data: the server does not allow access to the requested resource; if you choose to ignore these errors, turn validation off with --validate=false | ||
``` | ||
There may be an issue with the kubectl version that is being run. This can occur if installing in virtual environments or from packages repositories. | ||
The current version verified to operate with the GPU Service is v1.24.10. kubectl and the Kubernetes API version can suffer from version skew if not with a defined number of releases. More information can be found on this under the [Kubernetes Version Skew Policy](https://kubernetes.io/releases/version-skew-policy/). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters