Failed to collect metrics: could not load NVML library #1

zh168654 · 2018-08-15T08:35:16Z

This is my deployment:

apiVersion: apps/v1beta1
kind: Deployment

metadata:
  name: nvidia-exporter
  namespace: monitoring
spec:
  replicas: 1
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: nvidia-exporter
    spec:
      containers:
        - name: nvidia-exporter
          securityContext:
            privileged: true
          image: bugroger/nvidia-exporter:latest
          imagePullPolicy: IfNotPresent
          ports:
            - containerPort: 9401
          volumeMounts:
            - mountPath: /usr/local/nvidia
              name: nvidia
      volumes:
        - name: nvidia
          hostPath:
            path: /home/zy/cuda

when I exec into nvidia-exporter and run

ls /usr/local/nvidia/lib64

there exists libnvidia-ml.so.1
But the container logs always show

Failed to collect metrics: could not load NVML library

The text was updated successfully, but these errors were encountered:

Cherishty · 2018-11-29T09:23:24Z

@zh168654 have you find any workaround or clues?
I am facing a similar error which says:

Failed to collect metrics: nvml: Not Supported

My Driver Version is : 390.59, GPU is Tesla K80.

While this error does NOT occur on other env whose GPU is GTX 1080

SjhZju · 2019-03-05T11:05:46Z

hi，

@zh168654 have you find any workaround or clues?
I am facing a similar error which says:

Failed to collect metrics: nvml: Not Supported

My Driver Version is : 390.59, GPU is Tesla K80.

While this error does NOT occur on other env whose GPU is GTX 1080

hi,
I have the same problem. I think it is the reason why exporter can not get metrics.
My Driver Version is 390.48, with two GTX 980. Server Os is Ubuntu 16.04

bmerry · 2022-06-23T09:21:11Z

I'm running into the same problem. I suspect it's because the Docker image is built with Alpine (and hence musl libc) while Nvidia's NVML library (libnvidia-ml.so) depends on glibc.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failed to collect metrics: could not load NVML library #1

Failed to collect metrics: could not load NVML library #1

zh168654 commented Aug 15, 2018

Cherishty commented Nov 29, 2018

SjhZju commented Mar 5, 2019

bmerry commented Jun 23, 2022

Failed to collect metrics: could not load NVML library #1

Failed to collect metrics: could not load NVML library #1

Comments

zh168654 commented Aug 15, 2018

Cherishty commented Nov 29, 2018

SjhZju commented Mar 5, 2019

bmerry commented Jun 23, 2022