Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rust build fails with latest version of CUDA. #101

Closed
MicahZoltu opened this issue Sep 22, 2024 · 8 comments
Closed

Rust build fails with latest version of CUDA. #101

MicahZoltu opened this issue Sep 22, 2024 · 8 comments
Labels
bug Something isn't working

Comments

@MicahZoltu
Copy link

Backend impacted

The Rust implementation

Operating system

Linux

Hardware

GPU with CUDA

Description

Attempting to build with the latest cuda driver/toolkit (12.6) results in the following error:

error: failed to run custom build command for `cudarc v0.11.6`

Caused by:
  process didn't exit successfully: `/moshi/rust/target/release/build/cudarc-52348fd62b0fe858/build-script-build` (exit status: 101)
  --- stdout
  cargo:rerun-if-changed=build.rs
  cargo:rerun-if-env-changed=CUDA_ROOT
  cargo:rerun-if-env-changed=CUDA_PATH
  cargo:rerun-if-env-changed=CUDA_TOOLKIT_ROOT_DIR

  --- stderr
  thread 'main' panicked at /root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/cudarc-0.11.6/build.rs:79:14:
  Unsupported cuda toolkit version: `12.6`. Please raise a github issue.
  note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
warning: build failed, waiting for other jobs to finish...

Extra information

The error says to file a GitHub issue, but it doesn't specify where I should file an issue so I'm filing one on the repo I am trying to build. It is possible this issue is in some dependency, but from the error it was not obvious to me how whatever dependency ran into this error was getting included as I don't see cudarc in any cargo files.

Environment

Fill in the following information on your system.

  • Operating system version: Ubuntu
    Note: I'm running in a docker container on Windows, but I'm quite confident that is unrelated as the error is during the build step which happens entirely within the nVidia Ubuntu docker container.

If the backend impacted is PyTorch:

  • Python version: N/A
  • PyTorch version: N/A
  • CUDA version: 12.6
  • GPU model and memory: NVIDIA Quadro RTX 4000.
@MicahZoltu MicahZoltu added the bug Something isn't working label Sep 22, 2024
@LaurentMazare
Copy link
Member

The message comes from cudarc, they already created a version to support cuda 12.6. I'm minting a new version of candle to support this in huggingface/candle#2494 and if all goes well will update moshi to benefit from it.

@LaurentMazare
Copy link
Member

Just merged #113 which brings compatibility with cuda 12.6, let us know if you still have issues.

@qaoo8
Copy link

qaoo8 commented Sep 25, 2024

@LaurentMazare

the build was passed after the update to v0.7.1, but showed new errors :

cargo run --features cuda --bin moshi-backend -r -- --config moshi-backend/config.json standalone
   Compiling candle-kernels v0.7.1
   Compiling candle-core v0.7.1
   Compiling candle-nn v0.7.1
   Compiling candle-transformers v0.7.1
   Compiling moshi v0.2.2 (/home/jackylee/myProjects/Ai-Rust/moshi/rust/moshi-core)
   Compiling moshi-backend v0.2.2 (/home/jackylee/myProjects/Ai-Rust/moshi/rust/moshi-backend)
    Finished `release` profile [optimized + debuginfo] target(s) in 56.29s
     Running `target/release/moshi-backend --config moshi-backend/config.json standalone`

2024-09-25T09:53:11.057410Z  INFO moshi_backend: build_info=BuildInfo { build_timestamp: "2024-09-25T08:46:04.268850290Z", build_date: "2024-09-25", git_branch: "main", git_timestamp: "2024-09-24T10:21:18.000000000+02:00", git_date: "2024-09-24", git_hash: "5d94128878678bb6e5a9471aed8c78f4d96eb473", git_describe: "5d94128", rustc_host_triple: "x86_64-unknown-linux-gnu", rustc_version: "1.81.0", cargo_target_triple: "x86_64-unknown-linux-gnu" }

2024-09-25T09:53:11.057436Z  INFO moshi_backend: starting process with pid 20925

Error: DriverError(CUDA_ERROR_UNSUPPORTED_PTX_VERSION, "the provided PTX was compiled with an unsupported toolchain.") when loading cast_f32_bf16

my system setting:
Ubuntu 24.04.1
6.8.0-31-generic

nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Wed_Aug_14_10:10:22_PDT_2024
Cuda compilation tools, release 12.6, V12.6.68
Build cuda_12.6.r12.6/compiler.34714021_0

nvidia-smi

NVIDIA-SMI 535.171.04 Driver Version: 535.171.04 CUDA Version: 12.2

I had tried to update the Driver version to 550 560, but after installing the nvidia-smi run failed, only the 535 works

Error: DriverError(CUDA_ERROR_UNSUPPORTED_PTX_VERSION, "the provided PTX was compiled with an unsupported toolchain.") when loading cast_f32_bf16

the errors occurred after the model was downloaded from huggingface

It looks like the CUDA driver version: 12.2 is vary from the nvcc --version: Build cuda_12.6.r12.6/compiler.34714021_0

Any idea how to solve this error will be a great help !!

@LaurentMazare
Copy link
Member

My guess is that 535.171.04 is too old for cuda 12.6, so you can either try installing a more recent driver and get it to work, or try compiling moshi with an older toolchain (12.2 would be the appropriate one for your drivers I guess).

@qaoo8
Copy link

qaoo8 commented Sep 25, 2024

I had check this CUDA Compatibility the v535 was CUDA Forward Compatible Upgrade of v12.6

I also tried to downgrade to 12.2 but some of the modules failed to build, I forgot to take some notes, and I am unsure if this happened in the candle module or the moshi module.

Since the requirement 1.1. System Requirements I also lock the Ubuntu kernel to 6.8.0-31-generic,

I am so confused about how to make the v550(v12.5) or v560(v12.6) work at my Ubuntu 24.0.4.1, may the 24.04 be too new to be updated, if anyone has solutions, let me know.

@arrizalamin
Copy link

Try to update cudarc in moshi-core crate to 0.12.1, that fixes the issue for me.

@LaurentMazare
Copy link
Member

We've bumped the dependency to a candle version that supports 12.6 and removed the cudarc one so hopefully the github version should be all good now.

@Ali-Kabbadj
Copy link

This might not be exactly this specific problem but hope it can help someone, Because i had similar errors to these.

What worked for me was to generate the ptx files from the cu's from the folder
"candle-kernels-0.7.2"
for me it was in
"C:\Users\<username>\.cargo\registry\src\index.crates.io-6f17d22bba15001f\candle-kernels-0.7.2\src"
, then move them to
"moshi\rust\target\release\build\candle-kernels-fd0438922b6766ae\out"
as they expected to be

the files i had to compile were:

  • unary.ptx
  • ternary.ptx
  • sort.ptx
  • reduce.ptx
  • quantized.ptx
  • indexing.ptx
  • fill.ptx
  • conv.ptx
  • cast.ptx
  • binary.ptx
  • affine.ptx

with the command :

nvcc -ccbin "C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.42.34433\bin\Hostx86\x64" --ptx --gpu-architecture=sm_89 <filename>.cu -I.

my suspension is that it couldn't fine the link.exe or the other executables on its own, or it could be the compilation parameters, im not sure, im still new to this ,still figuring it out, but i just wanted to put it out there hopefully it could point to something.

Im on Windows 11
4090 GPU
Intel CPU : on that not i had to put my power settings on saving mode, otherwise some other errors poped out to existence, files would just exit abruptly, typical intel stuff haha

nvcc --version                                                                                                                                                                                                                                           in pwsh at 23:56:52
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Wed_Oct_30_01:18:48_Pacific_Daylight_Time_2024
Cuda compilation tools, release 12.6, V12.6.85
Build cuda_12.6.r12.6/compiler.35059454_0

Currently waiting for the model to finish downloading, will keep this updated if anything comes up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants