Add the fp8-quantized GeMM for dense linear layers #18

sfc-gh-reyazda · 2024-06-11T20:14:52Z

No description provided.

sfc-gh-aqiao · 2024-06-11T20:23:54Z

vllm/model_executor/layers/quantization/deepspeedfp.py

@@ -160,14 +172,18 @@ def __new__(cls, orig_shape: torch.Size, params_dtype: torch.dtype,
            raise ImportError("Please install deepspeed>=0.14.2 via "
                              "`pip install deepspeed>=0.14.2` to use "
                              "deepspeedfp quantizer.") from err
+        reduce_dim = -1
+        if transposed:
+            orig_shape = (orig[:-2]+(orig[-1],orig[-2]))


orig should be orig_shape?

sfc-gh-reyazda added 3 commits June 11, 2024 20:14

Add the fp8-quantized GeMM for dense linear layers

4ce7c85

add enable_fused_kernel

7a6384b

set the fused-parameters to true for easier testing

1df2b4c

sfc-gh-aqiao reviewed Jun 11, 2024

View reviewed changes

sfc-gh-reyazda added 6 commits June 11, 2024 22:58

add fp16 quantized-gemm

d7b5f13

fix dtype for the gemm output

c54d3a8

remove autotuning and add the best configs

d47996d

fix conversion

fda9309

fix ckpt loading

7f951d0

remove fp8 kernel

df0547e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add the fp8-quantized GeMM for dense linear layers #18

Add the fp8-quantized GeMM for dense linear layers #18

sfc-gh-reyazda commented Jun 11, 2024

sfc-gh-aqiao Jun 11, 2024

Add the fp8-quantized GeMM for dense linear layers #18

Are you sure you want to change the base?

Add the fp8-quantized GeMM for dense linear layers #18

Conversation

sfc-gh-reyazda commented Jun 11, 2024

sfc-gh-aqiao Jun 11, 2024

Choose a reason for hiding this comment