Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single-core job CPU efficiency above 100% on SLURM #9057

Open
dkioroglou opened this issue Nov 28, 2024 · 1 comment
Open

Single-core job CPU efficiency above 100% on SLURM #9057

dkioroglou opened this issue Nov 28, 2024 · 1 comment

Comments

@dkioroglou
Copy link

Affected tool

MarkDuplicates

Affected version(s)

GATK v4.6.1.0

Description

I'm doing some tests running GATK on SLURM. I created a bash script specifying the following:

  1. I used the following SLURM parameters to allocate a single-thread and disable multithreading
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --hint=nomultithread
#SBATCH --mem=8G
  1. I used the recommended parameters mentioned at the GATK documentation in order to restrict the threads used by the Java garbage collector to 1
gatk --java-options "-XX:ConcGCThreads=1 -XX:ParallelGCThreads=1 -Xmx6g" MarkDuplicates \
    --REMOVE_DUPLICATES true \
    --VALIDATION_STRINGENCY SILENT \
    --INPUT ${infile} \
    --OUTPUT ${outfile} \
    --METRICS_FILE ${metrics}

Expected behavior

At the end of the job SLURM should report "CPU Efficiency" close or equal to 100%.

Actual behavior

At the end of the job SLURM reports "CPU Efficiency" close to 120%.
The same behavior was observed also on my local machine (=ordinary laptop) and without using SLURM. GATK was using all the cores with the above java options. The only way to restrict thread usage was to set:

taskset -c 0 gatk --java-options "-XX:ConcGCThreads=1 -XX:ParallelGCThreads=1 -Xmx6g" MarkDuplicates

which is a very bad implementation.
Is there any GATK parameter that I'm missing?

@gokalpcelik
Copy link
Contributor

Hi @dkioroglou
Although you are trying to limit java to a single core there are native libraries within gatk from intel GKL to accelerate compression and decompression all of which runs in the native space outside of JVM. Those may increase thread efficiency of the tool therefore we do not expect to have only 100% cpu efficiency in our tools.
This behavior is totally expected and intended.
Regards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants