Skip to content

Latest commit

 

History

History
28 lines (17 loc) · 1.41 KB

h100-nvl-tensor-core-gpus.md

File metadata and controls

28 lines (17 loc) · 1.41 KB

H100 NVL Tensor Core GPUs

Oscar has two DGX H100 nodes. H100 is based on the Nividia Hopper architecutre that accelerates the training of AI models. The two DGX nodes provides better performance when multiple GPUS are used, in particular with Nvidia software like NGC containers.

{% hint style="info" %} Multiple-Instance GPU (MIG) is not enabled on the DGX H100 nodes {% endhint %}

Hardware Specifications

Each DGX H100 node has 112 Intel CPUs with 2TB memory, and 8 Nvidia H100 GPUs. Each H100 GPU has 80G memory.

Access

The two DGX H100 nodes are in the gpu-he partition. To access H100 GPUs, users need to submit jobs to the gpu-he partition and request the h100 feature, i.e.

#SBATCH --partition=gpu-he
#SBATCH --constraint=h100

Running NGC Containers

NGC containers provide the best performance from the DGX H100 nodes. Running tensorflow containers is an example for running NGC containers.

Running Oscar Modules

The two nodes have Intel CPUs. So Oscar modules can still be loaded and run on the two DGX nodes.