How to speedup the process of quantization which takes almost 20 hours to quantize llama3-70B with w8a8 #968

moonlightian · 2024-12-11T02:33:59Z

llmcompressor==0.2.0
compressed-tensors==0.7.1
4 NVIDIA A100 80GB PCIe

from transformers import AutoModelForCausalLM, AutoTokenizer
MODEL_ID = "/llama3_1_70B"
model = SparseAutoModelForCausalLM.from_pretrained(
MODEL_ID, device_map="auto", torch_dtype="auto",
)
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
NUM_CALIBRATION_SAMPLES = 512
MAX_SEQUENCE_LENGTH = 2048
ds = ds.shuffle(seed=42).select(range(NUM_CALIBRATION_SAMPLES))

def preprocess(example):
    return {"text": tokenizer.apply_chat_template(example["text"], tokenize=False)}
ds = ds.map(preprocess)

def tokenize(sample):
    return tokenizer("\n\n".join(sample['text']), return_tensors='pt')
ds = ds.map(tokenize, remove_columns=ds.column_names)

from llmcompressor.transformers import oneshot
from llmcompressor.modifiers.quantization import GPTQModifier
from llmcompressor.modifiers.smoothquant import SmoothQuantModifier

# Configure the quantization algorithms
recipe = [
    SmoothQuantModifier(smoothing_strength=0.8),
    GPTQModifier(targets="Linear", dampening_frac=0.1, scheme="W8A8", ignore=["lm_head"]),
]

# Apply quantization
oneshot(
    model=model,
    dataset=ds,
    recipe=recipe,
    max_seq_length=MAX_SEQUENCE_LENGTH,
    num_calibration_samples=NUM_CALIBRATION_SAMPLES,
)

The text was updated successfully, but these errors were encountered:

dsikka · 2024-12-14T13:28:15Z

Hi @moonlightian:

Can you confirm you ran the above with 4 x A100s?
Can you also confirm the stub of the model that was used and the dataset name? I had some trouble processing the dataset listed with the preprocess functions provided.

moonlightian added the enhancement New feature or request label Dec 11, 2024

dsikka self-assigned this Dec 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to speedup the process of quantization which takes almost 20 hours to quantize llama3-70B with w8a8 #968

How to speedup the process of quantization which takes almost 20 hours to quantize llama3-70B with w8a8 #968

moonlightian commented Dec 11, 2024

dsikka commented Dec 14, 2024 •

edited

Loading

How to speedup the process of quantization which takes almost 20 hours to quantize llama3-70B with w8a8 #968

How to speedup the process of quantization which takes almost 20 hours to quantize llama3-70B with w8a8 #968

Comments

moonlightian commented Dec 11, 2024

dsikka commented Dec 14, 2024 • edited Loading

dsikka commented Dec 14, 2024 •

edited

Loading