Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] RuntimeError: Unable to JIT load the fp_quantizer op due to it not being compatible due to hardware/software issue. FP Quantizer is using an untested triton version (3.1.0), only 2.3.(0, 1) and 3.0.0 are known to be compatible with these kernels #6906

Open
GHBigD opened this issue Dec 23, 2024 · 0 comments
Labels
bug Something isn't working compression

Comments

@GHBigD
Copy link

GHBigD commented Dec 23, 2024

Describe the bug
I am out of my depth here but I'll try.
Installed deepspeed on vllm/vllm-openai docker via pip install deepspeed. Install went fine but when I tried to do an FP6 quant inflight on a model I got the error in the subject line. Noodling around, I see op_builder/fp_quantizer.py is checking the Triton version and presumably blocking it? I tried downgrading triton from 3.1.0 to 3.0.0 and caused a cascading array of interdependency issues. I would like to lift the version check and see if it works but I am not a coder and wouldn't know what to do.

To Reproduce
Steps to reproduce the behavior:

  1. load vllm/vllm-openai:latest docker
  2. install latest deepspeed
  3. attempt to load model vllm serve (model_id) with parameter --quantization deepspeedfp (need configuration.json file)
  4. See error

Expected behavior

ds_report output
[2024-12-23 14:25:46,009] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect)

DeepSpeed C++/CUDA extension op report

NOTE: Ops not installed will be just-in-time (JIT) compiled at
runtime if needed. Op compatibility means that your system
meet the required dependencies to JIT install the op.

JIT compiled ops requires ninja
ninja .................. [OKAY]

op name ................ installed .. compatible

[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io: please install the libaio-dev package with apt
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [NO] ....... [NO]
fused_adam ............. [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
cpu_adagrad ............ [NO] ....... [OKAY]
cpu_lion ............... [NO] ....... [OKAY]
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
evoformer_attn ......... [NO] ....... [NO]
[WARNING] FP Quantizer is using an untested triton version (3.1.0), only 2.3.(0, 1) and 3.0.0 are known to be compatible with these kernels
fp_quantizer ........... [NO] ....... [NO]
fused_lamb ............. [NO] ....... [OKAY]
fused_lion ............. [NO] ....... [OKAY]
/usr/bin/ld: cannot find -lcufile: No such file or directory
collect2: error: ld returned 1 exit status
gds .................... [NO] ....... [NO]
transformer_inference .. [NO] ....... [OKAY]
inference_core_ops ..... [NO] ....... [OKAY]
cutlass_ops ............ [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
ragged_device_ops ...... [NO] ....... [OKAY]
ragged_ops ............. [NO] ....... [OKAY]
random_ltd ............. [NO] ....... [OKAY]
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.5
[WARNING] using untested triton version (3.1.0), only 1.0.0 is known to be compatible
sparse_attn ............ [NO] ....... [NO]
spatial_inference ...... [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]

DeepSpeed general environment info:
torch install path ............... ['/usr/local/lib/python3.12/dist-packages/torch']
torch version .................... 2.5.1+cu124
deepspeed install path ........... ['/usr/local/lib/python3.12/dist-packages/deepspeed']
deepspeed info ................... 0.16.2, unknown, unknown
torch cuda version ............... 12.4
torch hip version ................ None
nvcc version ..................... 12.1
deepspeed wheel compiled w. ...... torch 2.5, cuda 12.4
shared memory (/dev/shm) size .... 46.57 GB

Screenshots

System info (please complete the following information):

  • OS: Ubuntu 22.04
  • GPU=2 A40

Launcher context

Docker context
vllm/vllm-openai:latest (0.65)

Additional context

@GHBigD GHBigD added bug Something isn't working compression labels Dec 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working compression
Projects
None yet
Development

No branches or pull requests

1 participant