Skip to content

letmutx/nomad-nvidia-vgpu-plugin

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Nomad Nvidia Virtual Device Plugin

This repo contains a device plugin for Nomad to support exposing a number of virtual GPUs for each physical GPU present on the machine. This enables running workloads which don't consume the whole GPU.

Installation requirements

This plugin needs the following dependencies to function:

  • Nomad 0.9+
  • GNU/Linux x86_64 with kernel version > 3.10
  • NVIDIA GPU with Architecture > Fermi (2.1)
  • NVIDIA drivers >= 340.29 with binary nvidia-smi
  • Docker v19.03+

Copy the plugin binary to the plugins directory and configure the plugin in the client config. Also, see the requirements for the official nvidia-plugin.

plugin "nvidia-vgpu" {
  config {
    ignored_gpu_ids    = ["uuid1", "uuid2"]
    fingerprint_period = "5s"
    vgpus = 16
  }
}

Usage

Use the device stanza in the job file to schedule with device support.

job "gpu-test" {
  datacenters = ["dc1"]
  type = "batch"

  group "smi" {
    task "smi" {
      driver = "docker"

      config {
        image = "nvidia/cuda:11.0-base"
        command = "nvidia-smi"
      }

      resources {
        device "letmutx/gpu" {
          count = 1

          # Add an affinity for a particular model
          affinity {
            attribute = "${device.model}"
            value     = "Tesla K80"
            weight    = 50
          }
        }
      }
    }
  }
}

Notes

  • GPU memory allocation/usage is handled in a cooperative manner. This means that one bad GPU process using more memory than assigned can cause starvation for other processes.
  • Managing memory isolation per task is left to the user. It depends on a lot of factors like MPS, GPU architecture etc. This doc has some information.

Testing

The best way to test the plugin is to go to a target machine with Nvidia GPU and run the plugin using Nomad's plugin launcher with:

make eval

Inspired by

About

Nomad plugin for sharing Nvidia GPU across multiple jobs

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published