Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

400% to 700% power usage increase when a Nvidia GPU is detected #534

Open
FurretUber opened this issue Jul 12, 2024 · 0 comments
Open

400% to 700% power usage increase when a Nvidia GPU is detected #534

FurretUber opened this issue Jul 12, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@FurretUber
Copy link

Describe the bug

While turing-smart-screen-python is running, Nvidia GPU is always at maximum frequency, high temperature and uses 8 times the power when idle.

To Reproduce
Steps to reproduce the behavior:

  1. Have Nvidia GPU;
  2. Start turing-smart-screen-python;
  3. Observe odd GPU behavior regarding power consumption, frequency and temperature.

Expected behavior
GPU frequency, temperature, and power usage aren't impacted in a significant way by turing-smart-screen-python.

Screenshots / photos of the Turing screen
Add screenshots or photos of the rendering on the smart screen to help explain your problem.
You can drag and drop photos here to add them to the description.

nvtop screenshot using the custom sensors below:

nvtop screenshot using the custom sensors below

nvtop screenshot using the default Nvidia detection:

nvtop screenshot using the default Nvidia detection

Environment:

  • Smart screen model: This one, the unnamed 3,5".
  • Revision of this project: main, 0f68e15b024a5099e57ace3f969c1219e467df08
  • OS with version: Ubuntu 22.04
  • Python version: 3.10.13
  • Hardware: AMD Ryzen 5 5600G, NVIDIA GeForce RTX 3060 (12 GB), 64 GB RAM

Additional context
In the last few days, I had observed an odd behavior on my headless desktop: the GPU temperature and frequency were always high as if it was being used, and power usage was way higher than normal (from 5 W to 40 W idle). As I investigated this, I found the problem only happened when turing-smart-screen-python was running. I tried commenting out the entire GPU: sections from the theme file, but the problem persisted.

To fix this, I edited sensors_python.py and removed the Nvidia detection. This way, GPU temperature and frequency returned to normal.

What is even more strange is that I set up custom sensors to read the GPU data, there was no change in power consumption, temperature or frequency at all (WARNING: Works on my machine™ code):

class nvGPUFreq(CustomDataSource):
    def as_numeric(self) -> float:
        pass
    def as_string(self) -> str:
        try:
            saidaNvidia = obtemDadosNvidia()
            linhaDividida = saidaNvidia.strip().split()
            coreFreq = linhaDividida[5].strip()
            return '{}MHz'.format(coreFreq).rjust(7)
        except Exception as err:
            print(err)
            return ''


class nvGPUTemp(CustomDataSource):
    def as_numeric(self) -> float:
        pass
    def as_string(self) -> str:
        try:
            saidaNvidia = obtemDadosNvidia()
            linhaDividida = saidaNvidia.strip().split()
            gpuTemp = linhaDividida[2].strip()
            return '{}°C'.format(gpuTemp).rjust(5)
        except Exception as err:
            print(err)
            return ''


class nvGPUMem(CustomDataSource):
    def as_numeric(self) -> float:
        try:
            saidaNvidia = obtemDadosNvidia()
            linhaDividida = saidaNvidia.strip().split()
            gpuMem = int(linhaDividida[6]) + int(linhaDividida[7]) + int(linhaDividida[8])
            return gpuMem
        except Exception as err:
            print(err)
            return 0
    def as_string(self) -> str:
        try:
            saidaNvidia = obtemDadosNvidia()
            linhaDividida = saidaNvidia.strip().split()
            gpuMem = int(linhaDividida[6]) + int(linhaDividida[7]) + int(linhaDividida[8])
            return '{} MB'.format(gpuMem).rjust(8)
        except Exception as err:
            print(err)
            return ''


class nvGPUMemPercent(CustomDataSource):
    def as_numeric(self) -> float:
        try:
            saidaNvidia = obtemDadosNvidia()
            linhaDividida = saidaNvidia.strip().split()
            gpuMemPercent = int(round(100 * (int(linhaDividida[6]) + int(linhaDividida[7]) + int(linhaDividida[8]))/int(linhaDividida[15]),0))
            print(gpuMemPercent)
            return gpuMemPercent
        except Exception as err:
            print(err)
            return 0
    def as_string(self) -> str:
        pass

saidaNvidia = ""
ultimaExecucaoNvidiaSMI = 0


def obtemDadosNvidia():
    global saidaNvidia
    global ultimaExecucaoNvidiaSMI
    if (time.time() - ultimaExecucaoNvidiaSMI < 1):
        return saidaNvidia
    ultimaExecucaoNvidiaSMI = time.time()
    processoNV = subprocess.Popen(["nvidia-smi", "dmon", "-s", "pcmu", "-c", "1"],stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    saidaEErro = processoNV.communicate()
    for linha in saidaEErro[0].decode(encoding="utf-8").strip().split('\n'):
        if (linha.startswith('#')):
            continue
    processoNV_2 = subprocess.Popen(["nvidia-smi", "--query-gpu", "memory.total", "--id=0", "--format=csv,nounits,noheader"],stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    saidaEErro_2 = processoNV_2.communicate()
    saidaNvidia = linha + " " + saidaEErro_2[0].decode(encoding="utf-8").strip()
    print(linha)
    return linha

I'm doubtful this is a bug that was introduced on turing-smart-screen-python, as I had an older version available and it's presenting the same behavior now. This may be related to updates on the kernel or Nvidia drivers that caused some change that are now triggering this abnormal behavior. However, if this becomes the new "default", then it may cause problems, as cooking GPUs.

Information about the Nvidia driver: Driver Version: 555.42.06 CUDA Version: 12.5. Tested with 5.15 and 6.5 kernels available on Ubuntu repository.

@FurretUber FurretUber added the bug Something isn't working label Jul 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant