H100 NVL Tensor Core GPUs

Oscar has two DGX H100 nodes. H100 is based on the Nividia Hopper architecutre that accelerates the training of AI models. The two DGX nodes provides better performance when multiple GPUS are used, in particular with Nvidia software like NGC containers.

{% hint style="info" %} Multiple-Instance GPU (MIG) is not enabled on the DGX H100 nodes {% endhint %}

Hardware Specifications

Each DGX H100 node has 112 Intel CPUs with 2TB memory, and 8 Nvidia H100 GPUs. Each H100 GPU has 80G memory.

Access

The two DGX H100 nodes are in the gpu-he partition. To access H100 GPUs, users need to submit jobs to the gpu-he partition and request the h100 feature, i.e.

#SBATCH --partition=gpu-he
#SBATCH --constraint=h100

Running NGC Containers

NGC containers provide the best performance from the DGX H100 nodes. Running tensorflow containers is an example for running NGC containers.

Running Oscar Modules

The two nodes have Intel CPUs. So Oscar modules can still be loaded and run on the two DGX nodes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

h100-nvl-tensor-core-gpus.md

h100-nvl-tensor-core-gpus.md

H100 NVL Tensor Core GPUs

Hardware Specifications

Access

Running NGC Containers

Running Oscar Modules

Files

h100-nvl-tensor-core-gpus.md

Latest commit

History

h100-nvl-tensor-core-gpus.md

File metadata and controls

H100 NVL Tensor Core GPUs

Hardware Specifications

Access

Running NGC Containers

Running Oscar Modules