Skip to content

Commit

Permalink
Merge pull request #110 from MarshtompCS/main
Browse files Browse the repository at this point in the history
the solution for insufficient shared memory size
  • Loading branch information
agngrant authored Nov 13, 2023
2 parents 288e9fd + b66904c commit 727c0bc
Showing 1 changed file with 20 additions and 0 deletions.
20 changes: 20 additions & 0 deletions docs/services/gpuservice/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,3 +29,23 @@ error: error validating "myjobfile.yml": error validating data: the server does
There may be an issue with the kubectl version that is being run. This can occur if installing in virtual environments or from packages repositories.
The current version verified to operate with the GPU Service is v1.24.10. kubectl and the Kubernetes API version can suffer from version skew if not with a defined number of releases. More information can be found on this under the [Kubernetes Version Skew Policy](https://kubernetes.io/releases/version-skew-policy/).
### Insufficient Shared Memory Size
My SHM is very small, and it causes "OSError: [Errno 28] No space left on device" when I train a model using multi-GPU. How to increase SHM size?
The default size of SHM is only 64M. You can mount an empty dir to /dev/shm to solve this problem:
```yaml
spec:
containers:
- name: [NAME]
image: [IMAGE]
volumeMounts:
- mountPath: /dev/shm
name: dshm
volumes:
- name: dshm
emptyDir:
medium: Memory
```

0 comments on commit 727c0bc

Please sign in to comment.