Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Summary: 1) added int4 to autoquant using hqq by default 2) fixes to hqq in normal int4 class so it can actually be used with normal UX 3) adding hqq to eval/generate 3) eval hqq to make sure its a reasonable default for autoquant 4) running llama3 eval now that llama3 is working correctly (fixed in 3.1 PR) 5) testing hqq v GPTQ so we have a comparison in our benchmarks/eval 6) GPTQ was broken -> fixes to utils and GPTQ to fix broken code Test Plan: benchmarks.sh (new autoquant-int4 benchmarks) export CHECKPOINT_PATH=../../../checkpoints export MODEL_REPO=meta-llama/Llama-2-7b-chat-hf python eval.py --checkpoint_path $CHECKPOINT_PATH/$MODEL_REPO/model.pth python eval.py --checkpoint_path $CHECKPOINT_PATH/$MODEL_REPO/model.pth --quantization int8wo python eval.py --checkpoint_path $CHECKPOINT_PATH/$MODEL_REPO/model.pth --quantization int8dq --compile python eval.py --checkpoint_path $CHECKPOINT_PATH/$MODEL_REPO/model.pth --quantization int4wo-64-hqq python eval.py --checkpoint_path $CHECKPOINT_PATH/$MODEL_REPO/model.pth --quantization int4wo-64 python eval.py --checkpoint_path $CHECKPOINT_PATH/$MODEL_REPO/model.pth --quantization int4wo-64-gptq export MODEL_REPO=meta-llama/Meta-Llama-3-8B python eval.py --checkpoint_path $CHECKPOINT_PATH/$MODEL_REPO/model.pth python eval.py --checkpoint_path $CHECKPOINT_PATH/$MODEL_REPO/model.pth --quantization int8wo python eval.py --checkpoint_path $CHECKPOINT_PATH/$MODEL_REPO/model.pth --quantization int8dq --compile python eval.py --checkpoint_path $CHECKPOINT_PATH/$MODEL_REPO/model.pth --quantization int4wo-64-hqq python eval.py --checkpoint_path $CHECKPOINT_PATH/$MODEL_REPO/model.pth --quantization int4wo-64 python eval.py --checkpoint_path $CHECKPOINT_PATH/$MODEL_REPO/model.pth --quantization int4wo-64-gptq (see results in README.md) Reviewers: Subscribers: Tasks: Tags:
- Loading branch information