why is unsloth thinking I'm doing multi gpu optimization when I'm not? #1240

brando90 · 2024-11-05T03:44:25Z

code

'''
conda activate beyond_scale_2_unsloth
'''
import torch
from datasets import load_dataset
from trl import SFTConfig, SFTTrainer
from unsloth import FastLanguageModel
from transformers import TrainingArguments
from pathlib import Path

from pdb import set_trace as st

opt_args = {
    'batch_size': 8,
    'learning_rate': 5e-2,
    'epochs': 1,
    'adam_epsilon': 1e-8,
    'weight_decay': 1e-4,
    'num_workers': 0,
    'break_early': False
}
hf_args = {'max_seq_length': 256, 'dataset_text_field': "text"}

# Set data type and device
torch_dtype = torch.bfloat16 if torch.cuda.is_bf16_supported() else torch.float32
device = torch.device(f"cuda:{0}" if torch.cuda.is_available() else "cpu")

# Load model and tokenizer using Unsloth
model, tokenizer = FastLanguageModel.from_pretrained(
    # model_name="unsloth/Qwen2-1.5B",
    model_name="Qwen/Qwen2.5-Math-1.5B-Instruct",
    max_seq_length=hf_args['max_seq_length'],
    dtype=None,  # Auto-detection for Float16/BFloat16
    load_in_4bit=False,  # Set False if not using 4-bit precision
)

model = model.to(device)
tok = tokenizer
tok.pad_token = tok.eos_token if tok.pad_token_id is None else tok.pad_token

# Add LoRA adapters, targeting only `lm_head` for fine-tuning
st()
model = FastLanguageModel.get_peft_model(
    model=model,
    r=16,  # LoRA rank
    target_modules=["lm_head"],  # Only optimize `lm_head`
    lora_alpha=16,
    lora_dropout=0,
    bias="none",
    use_gradient_checkpointing="unsloth",
)

# Load dataset
dataset = load_dataset("stanfordnlp/imdb", split="train")

# Define training configuration
training_args = TrainingArguments(
    per_device_train_batch_size=opt_args['batch_size'],
    gradient_accumulation_steps=4,
    num_train_epochs=opt_args['epochs'],
    learning_rate=opt_args['learning_rate'],
    bf16=torch.cuda.is_bf16_supported(),
    logging_steps=1,
    optim="paged_adamw_32bit",
    weight_decay=opt_args['weight_decay'],
    output_dir="./tmp",
    report_to='none'
)

# Initialize the Trainer
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    dataset_text_field=hf_args['dataset_text_field'],
    max_seq_length=hf_args['max_seq_length'],
    args=training_args,
)

# Print norms before training to check only lm_head will change
print(f'{model.model.embed_tokens.weight.norm(2)=}')
print(f'{model.model.layers[14].self_attn.v_proj.weight.norm(2)=}')
print(f'{model.model.layers[14].mlp.down_proj.weight.norm(2)=}')
print(f'{model.lm_head.weight.norm(2)=}')

# Start training
trainer.train()

# Print norms after training to verify only lm_head changed
print(f'{model.model.embed_tokens.weight.norm(2)=}')
print(f'{model.model.layers[14].self_attn.v_proj.weight.norm(2)=}')
print(f'{model.model.layers[14].mlp.down_proj.weight.norm(2)=}')
print(f'{model.lm_head.weight.norm(2)=}')

print("Done!\a")

but I'm only doing 1 gpu a100...

(beyond_scale_2_unsloth) brando9@ampere1~/beyond-scale-2-alignment-coeff $ python /lfs/ampere1/0/brando9/beyond-scale-2-alignment-coeff/experiments/bm/2024/11_november/week_4_8/train_unsloth_head_qwen2.py
🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
==((====))==  Unsloth 2024.10.7: Fast Qwen2 patching. Transformers = 4.46.1.
   \\   /|    GPU: NVIDIA A100-SXM4-80GB. Max memory: 79.138 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.5.1+cu124. CUDA = 8.0. CUDA Toolkit = 12.4.
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.28.post3. FA2 = True]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Traceback (most recent call last):
  File "/lfs/ampere1/0/brando9/beyond-scale-2-alignment-coeff/experiments/bm/2024/11_november/week_4_8/train_unsloth_head_qwen2.py", line 29, in <module>
    model, tokenizer = FastLanguageModel.from_pretrained(
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lfs/ampere1/0/brando9/miniconda/envs/beyond_scale_2_unsloth/lib/python3.11/site-packages/unsloth/models/loader.py", line 332, in from_pretrained
    model, tokenizer = dispatch_model.from_pretrained(
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lfs/ampere1/0/brando9/miniconda/envs/beyond_scale_2_unsloth/lib/python3.11/site-packages/unsloth/models/qwen2.py", line 87, in from_pretrained
    return FastLlamaModel.from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lfs/ampere1/0/brando9/miniconda/envs/beyond_scale_2_unsloth/lib/python3.11/site-packages/unsloth/models/llama.py", line 1645, in from_pretrained
    raise RuntimeError('Unsloth currently does not support multi GPU setups - but we are working on it!')
RuntimeError: Unsloth currently does not support multi GPU setups - but we are working on it!

danielhanchen · 2024-11-05T10:03:27Z

Hm that is very weird - is this like a machine with multiple cards - could you try nvidia-smi

brando90 · 2024-11-05T16:58:45Z

…

On Nov 5, 2024, at 2:03 AM, Daniel Han ***@***.***> wrote: Hm that is very weird - is this like a machine with multiple cards - could you try nvidia-smi — Reply to this email directly, view it on GitHub <#1240 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAOE6LRFOPKDPTBSJMSVXKDZ7CJYNAVCNFSM6AAAAABRFTUK7OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINJWG42DGNRTHE>. You are receiving this because you authored the thread.

Peter-Fy · 2024-11-10T04:52:45Z

I encountered the same issue on a single machine with multiple GPUs. I used os.environ["CUDA_VISIBLE_DEVICES"] = "1" at the beginning of the code to set a single GPU, but sometimes it throws the following error:

RuntimeError: Unsloth currently does not support multi GPU setups - but we are working on it!

Without changing any code, rerunning it sometimes succeeds and sometimes fails.
I believe this issue is the same as #983, and I hope it can be fixed as soon as possible.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

why is unsloth thinking I'm doing multi gpu optimization when I'm not? #1240

why is unsloth thinking I'm doing multi gpu optimization when I'm not? #1240

brando90 commented Nov 5, 2024

danielhanchen commented Nov 5, 2024

brando90 commented Nov 5, 2024 via email

Peter-Fy commented Nov 10, 2024 •

edited

Loading

why is unsloth thinking I'm doing multi gpu optimization when I'm not? #1240

why is unsloth thinking I'm doing multi gpu optimization when I'm not? #1240

Comments

brando90 commented Nov 5, 2024

danielhanchen commented Nov 5, 2024

brando90 commented Nov 5, 2024 via email

Peter-Fy commented Nov 10, 2024 • edited Loading

Peter-Fy commented Nov 10, 2024 •

edited

Loading