Torchtune recipes for fine-tuning on AMD Radeon GPUs
- wandb.ai account and API Key
In order to export training metrics to Weights and Biases (wandb.ai) you will need to have already created an account on http://wandb.ai and saved your api-key to an environment variable
YOUR_API_KEY=ABCxxxxxxxxxxxxxxxxx123
If you do not want to sync metrics to Weights and Biases or want to skip this you can run in offline mode. Note that this is not recommended and you should instead choose another logging method in your torchtune config file such as `torchtune.training.metric_logging.DiskLogger`.
WANDB_MODE=offline
git clone https://github.com/pytorch/torchtune.git
cd torchtune
Create Torchtune virtual environment and activate venv
python3 -m venv .venv/torchtune
source .venv/torchtune/bin/activate
pip install -e .
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2
pip install torchao wandb
WANDB_API_KEY=$YOUR_API_KEY
WANDB_NAME="llama3.2-vision-11b-instruct Lora Fine-tuning Demo"
WANDB_NOTES="Demo of Lora Fine-tuning Llama3.2-vision-11b-intruct model on multi-modal image + text dataset on AMD GPUs"
Note: If the above WandB Environment variables aren't working you can log into you WandB account with your API key manually by doing:
wandb login
tune download meta-llama/Llama-3.2-11B-Vision-Instruct --output-dir /tmp/Llama-3.2-11B-Vision-Instruct --ignore-patterns "original/consolidated*"
wget https://raw.githubusercontent.com/farshadghodsian/torchtune-amd-recipes/refs/heads/main/recipes/configs/llama3_2_vision/llama3.2-vision-11b-lora-finetune.yaml
wget https://raw.githubusercontent.com/farshadghodsian/torchtune-amd-recipes/refs/heads/main/recipes/configs/llama3_2_vision/llama3.2-vision-11b-lora-finetune-single-device.yaml
tune run --nproc_per_node 8 lora_finetune_distributed --config ./llama3.2-vision-11b-lora-finetune.yaml
tune run lora_finetune_single_device --config ./llama3.2-vision-11b-lora-finetune-single-device.yaml
Radeon GPUs now have experimental support for Flash Attention through OpenAI's Triton (AOTrition). To enable this support you will need to set the following environment variable:
TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1
In addition to Experimental Flash Attention support PyTorch will defaul to tusing HIPBLASLT for AMD GPUs, however Radeon GPUs do not support the HIPBLASLT library so you will need to default to using HIPBLAS instead. The latest version of PyTorch does this by default, but will throw warnings about it. To disable the warnings set the following environment variable:
TORCH_BLAS_PREFER_HIPBLASLT=0
You can include these two env vars in the training run command by adding them before your tune run
command as follows:
TORCH_BLAS_PREFER_HIPBLASLT=0 TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1 tune run lora_finetune_single_device --config ./llama3.2-vision-11b-lora-finetune-single-device.yaml