Skip to content

Commit

Permalink
Merge pull request #119 from EPCCed/omp_thread_update
Browse files Browse the repository at this point in the history
AG: added OMP Thread issue on Pytorch
  • Loading branch information
agngrant authored Nov 30, 2023
2 parents 2221401 + 4625dc2 commit 951b823
Showing 1 changed file with 27 additions and 0 deletions.
27 changes: 27 additions & 0 deletions docs/services/gpuservice/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,3 +49,30 @@ The default size of SHM is only 64M. You can mount an empty dir to /dev/shm to s
emptyDir:
medium: Memory
```
### Pytorch Slow Performance Issues
Pytorch on Kubernetes may operate slower than expected - much slower than an equivalent VM setup.
Pytorch defaults to auto-detecting the number of OMP Threads and it will report an incorrect number of potential threads compared to your requested CPU core count. This is a consequence in operating in a container environment, the CPU information is reported by standard libraries and tools will be the node level information rather than your container.
To help correct this issue, the environment variable OMP_NUM_THREADS should be set in the job submission file to the number of cores requested or less.
This has been tested using:
- OMP_NUM_THREADS=1
- OMP_NUM_THREADS=(number of requested cores).
Example fragment for a Bash command start:
```yaml
containers:
- args:
- >
export OMP_NUM_THREADS=1;
python mypytorchprogram.py;
command:
- /bin/bash
- '-c'
- '--'
```

0 comments on commit 951b823

Please sign in to comment.