Skip to content

Commit

Permalink
the solution for insufficient shared memory size
Browse files Browse the repository at this point in the history
  • Loading branch information
yuzhaouoe committed Oct 12, 2023
1 parent 9889465 commit b66904c
Showing 1 changed file with 20 additions and 0 deletions.
20 changes: 20 additions & 0 deletions docs/services/gpuservice/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,3 +29,23 @@ error: error validating "myjobfile.yml": error validating data: the server does
There may be an issue with the kubectl version that is being run. This can occur if installing in virtual environments or from packages repositories.
The current version verified to operate with the GPU Service is v1.24.10. kubectl and the Kubernetes API version can suffer from version skew if not with a defined number of releases. More information can be found on this under the [Kubernetes Version Skew Policy](https://kubernetes.io/releases/version-skew-policy/).
### Insufficient Shared Memory Size
My SHM is very small, and it causes "OSError: [Errno 28] No space left on device" when I train a model using multi-GPU. How to increase SHM size?
The default size of SHM is only 64M. You can mount an empty dir to /dev/shm to solve this problem:
```yaml
spec:
containers:
- name: [NAME]
image: [IMAGE]
volumeMounts:
- mountPath: /dev/shm
name: dshm
volumes:
- name: dshm
emptyDir:
medium: Memory
```

0 comments on commit b66904c

Please sign in to comment.