Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

undefined symbol: nvmlGpuInstanceGetComputeInstanceProfileInfoV when trying to build in nix #47

Closed
geekodour opened this issue Jul 27, 2024 · 4 comments

Comments

@geekodour
Copy link

I was trying to build it for my local nix packages,

# see https://github.com/NixOS/nixpkgs/pull/304108
# see https://github.com/hashicorp/nomad-device-nvidia
{ lib, buildGoModule, fetchFromGitHub }:

buildGoModule rec {
  pname = "nomad-device-nvidia";
  version = "66fe3a14e471f4844dffa13ada3c6fdadcd98ab7"; # Jul 10, 2024

  src = fetchFromGitHub {
    owner = "hashicorp";
    repo = pname;
    rev = "${version}";
    sha256 = "sha256-2zdTslzWnaWg3I4bijYIU+nBDsab25iVO8x7v5ymamM=";
  };

  vendorHash = "sha256-h2qp/wHlvqiNLl6dw7UD+/G0iPfdEj8KsACCRMSUYaI=";

  subPackages = [ "." ];

  meta = with lib; {
    homepage = "https://github.com/hashicorp/nomad-device-nvidia";
    description = "Nomad device plugin for Nvidia GPUs";
    mainProgram = "nomad-device-nvidia";
    platforms = platforms.linux;
    license = licenses.mpl20;
    maintainers = with maintainers; [ geekodour ];
  };
}

But then I get the error:

error: builder for '/nix/store/lk3fq47dx04hj413znvaf8xi0b3p3n5h-nomad-device-nvidia-66fe3a14e471f4844dffa13ada3c6fdadcd98ab7.drv' failed with exit code 1;
       last 10 log lines:
       > source root is source
       > Running phase: patchPhase
       > Running phase: updateAutotoolsGnuConfigScriptsPhase
       > Running phase: configurePhase
       > Running phase: buildPhase
       > Building subPackage ./.
       > Running phase: checkPhase
       > /build/go-build727756788/b001/nomad-device-nvidia.test: symbol lookup error: /build/go-build727756788/b001/nomad-device-nvidia.test: undefined symbol: nvmlGpuInstanceGetComputeInstanceProfileInfoV
       > FAIL   github.com/hashicorp/nomad-device-nvidia        0.001s
       > FAIL
       For full logs, run 'nix log /nix/store/lk3fq47dx04hj413znvaf8xi0b3p3n5h-nomad-device-nvidia-66fe3a14e471f4844dffa13ada3c6fdadcd98ab7.drv'.

I am trying to get this to work at the moment, will post updates. Let me know if any suggestions around what should fix this.

I think #34 might be related.

@geekodour
Copy link
Author

geekodour commented Jul 27, 2024

current workaround:

  doCheck = false;

EDIT: This did not solve the issue. It simply skipped the test but now the binary that's built throws the following:

./cmd: symbol lookup error: ./cmd: undefined symbol: nvmlGpuInstanceGetComputeInstanceProfileInfoV

I spent sometime trying to figure out what's wrong as I was using a fairly straightforward buildGoModule.
After a while, I decided to skip solving for this and thought vendoring the dependencies will simplify this issue a bit so I created a fork (https://github.com/geekodour/nomad-device-nvidia) with the dependencies vendor'ed (go mod tidy, go mod vendor).

I could verify the existence of vendor/github.com/NVIDIA/go-nvml/pkg/nvml/ in my fork, which is referred here:

"github.com/NVIDIA/go-nvml/pkg/nvml"

Now I do a nix build and I get:

λ nix build --impure --no-link --print-out-paths .#nomad-device-nvidia
path '/home/geekodour/x/newnixsetup/pkgs' does not contain a 'flake.nix', searching up
warning: Git tree '/home/geekodour/x' is dirty
error: builder for '/nix/store/yi0xqzsydd9xs4a1j0l7s4vi85wdv77k-nomad-device-nvidia-8598e31a0a38a9ed5e14451cf86ab8a8211ab98b.drv' failed with exit code 1;
       last 9 log lines:
       > Running phase: unpackPhase
       > unpacking source archive /nix/store/32hmlbaxdcspv9qq1rk4v1i60h4lws9y-source
       > source root is source
       > Running phase: patchPhase
       > Running phase: updateAutotoolsGnuConfigScriptsPhase
       > Running phase: configurePhase
       > Running phase: buildPhase
       > Building subPackage ./cmd
       > nvml/driver_linux.go:11:2: cannot find module providing package github.com/NVIDIA/go-nvml/pkg/nvml: import lookup disabled by -mod=vendor
       For full logs, run 'nix log /nix/store/yi0xqzsydd9xs4a1j0l7s4vi85wdv77k-nomad-device-nvidia-8598e31a0a38a9ed5e14451cf86ab8a8211ab98b.drv'.

This does not make any sense. I tried digging into more issues, found one related issue where problems were caused by the use of uppercase letters: NixOS/nixpkgs#273998 (comment)

at this point i am clueless, so I am re-opening the issue even if its not directly related to nomad-device-nvidia(the makefile commands directly are working absolutely fine) but more of a nix issue at this point or me messing something up.

Full error(when using vendored mod):

warning: The interpretation of store paths arguments ending in `.drv` recently changed. If this command is now failing try again with '/nix/store/qlm42n6c6wl514fg0bdfdl1f022axlrg-nomad-device-nvidia-8598e31a0a38a9ed5e14451cf86ab8a8211ab98b.drv^*'
Sourcing auto-add-driver-runpath-hook
Using autoAddDriverRunpath
Sourcing fix-elf-files.sh
@nix { "action": "setPhase", "phase": "unpackPhase" }
Running phase: unpackPhase
unpacking source archive /nix/store/32hmlbaxdcspv9qq1rk4v1i60h4lws9y-source
source root is source
@nix { "action": "setPhase", "phase": "patchPhase" }
Running phase: patchPhase
@nix { "action": "setPhase", "phase": "updateAutotoolsGnuConfigScriptsPhase" }
Running phase: updateAutotoolsGnuConfigScriptsPhase
@nix { "action": "setPhase", "phase": "configurePhase" }
Running phase: configurePhase
@nix { "action": "setPhase", "phase": "buildPhase" }
Running phase: buildPhase
Building subPackage ./cmd
nvml/driver_linux.go:11:2: cannot find module providing package github.com/NVIDIA/go-nvml/pkg/nvml: import lookup disabled by -mod=vendor
        (Go version in go.mod is at least 1.14 and vendor directory exists.)

@geekodour geekodour reopened this Jul 27, 2024
@geekodour geekodour changed the title undefined symbol: nvmlGpuInstanceGetComputeInstanceProfileInfoV undefined symbol: nvmlGpuInstanceGetComputeInstanceProfileInfoV when trying to build in nix Jul 27, 2024
@geekodour
Copy link
Author

geekodour commented Jul 27, 2024

Reproducible example:

# see https://github.com/NixOS/nixpkgs/pull/304108
# see https://github.com/hashicorp/nomad-device-nvidia
# see https://github.com/geekodour/nomad-device-nvidia
{ lib, pkgs, buildGoModule, fetchFromGitHub }:

buildGoModule rec {
  pname = "nomad-device-nvidia";
  version = "8598e31a0a38a9ed5e14451cf86ab8a8211ab98b"; # Jul 27, 2024

  #nativeBuildInputs = [ pkgs.autoAddDriverRunpath ];

  CGO_ENABLED = 1;
  # GOOS = "linux";
  # GOARCH = "amd64";

  # doCheck = true;
  # doInstallCheck = false;
  # runVend = true;
  proxyVendor = true;
  # deleteVendor = true;

  src = fetchFromGitHub {
    owner = "geekodour";
    repo = pname;
    rev = "${version}";
    sha256 = "sha256-urASq/T4XcDVUp03bCKqvojCjLrGb+l47JbZWsHbSGg=";
    # sha256 = lib.fakeHash;
  };

  vendorHash = null;

  subPackages = [ "cmd" ];
  # subPackages = [ "." ];

  meta = with lib; {
    homepage = "https://github.com/hashicorp/nomad-device-nvidia";
    description = "Nomad device plugin for Nvidia GPUs";
    mainProgram = "nomad-device-nvidia";
    platforms = platforms.linux;
    license = licenses.mpl20;
    maintainers = with maintainers; [ geekodour ];
  };
}

@geekodour
Copy link
Author

I adopted a very rough workaround for now, have one directory in my homedir where I have the compiled binary and using it from a overlay package:

{ lib, pkgs, stdenv, fetchFromGitHub }:

stdenv.mkDerivation rec {
  name = "nomad-device-nvidia";
  src = /home/geekodour/infra/nomad-plugins/nomad-device-nvidia;

  nativeBuildInputs = [pkgs.autoAddDriverRunpath];
  buildPhase = "";
  dontUnpack = true;
  doCheck = false;
  installPhase = ''
    #cp -r $src $out
    mkdir -p $out/bin
    cp $src/nomad-device-nvidia $out/bin
  '';

  meta = with lib; {
    homepage = "https://github.com/hashicorp/nomad-device-nvidia";
    description = "Nomad device plugin for Nvidia GPUs";
    mainProgram = "nomad-device-nvidia";
    platforms = platforms.linux;
    license = licenses.mpl20;
    maintainers = with maintainers; [ geekodour ];
  };
}

@shoenig
Copy link
Member

shoenig commented Aug 22, 2024

Hi @geekodour glad you got it working. While we can appreciate NixOS we don't have the expertise to help support it; the driver builds and packages well on the [mainstream] distros we support customers with.

@shoenig shoenig closed this as completed Aug 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants