Skip to content

Commit

Permalink
Llama 3.1 Configs and Code (#1208)
Browse files Browse the repository at this point in the history
  • Loading branch information
joecummings committed Jul 24, 2024
1 parent 9ce0c32 commit 403c7f3
Show file tree
Hide file tree
Showing 15 changed files with 1,530 additions and 35 deletions.
53 changes: 22 additions & 31 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,15 @@
# torchtune

[![Unit Test](https://github.com/pytorch/torchtune/actions/workflows/unit_test.yaml/badge.svg?branch=main)](https://github.com/pytorch/torchtune/actions/workflows/unit_test.yaml)
![Recipe Integration Test](https://github.com/pytorch/torchtune/actions/workflows/recipe_test.yaml/badge.svg)
[![](https://dcbadge.vercel.app/api/server/4Xsdn8Rr9Q?style=flat)](https://discord.gg/4Xsdn8Rr9Q)

 
 

torchtune now officially supports Meta Llama3! Check out our recipes for Llama3-8B-Instruct with LoRA, QLoRA and Full fine-tune in the [Llama3](#llama3) section! We also support 70B fine-tuning with LoRA! 🚀 🦙

# torchtune

[**Introduction**](#introduction) | [**Installation**](#installation) | [**Get Started**](#get-started) | [**Documentation**](https://pytorch.org/torchtune/main/index.html) | [**Design Principles**](#design-principles) | [**Community Contributions**](#community-contributions) | [**License**](#license)

 

> **July 2024**: torchtune has updated model weights for Llama3.1 in source and nightly builds! Check out our configs for both the [8B and 70B versions](recipes/configs/llama3_1/) of the model. LoRA, QLoRA, and full finetune methods are supported. Support for QLoRA 405B will be added soon.
## Introduction

torchtune is a PyTorch-native library for easily authoring, fine-tuning and experimenting with LLMs. We're excited to announce our alpha release!
Expand Down Expand Up @@ -44,14 +40,15 @@ torchtune currently supports the following models.

| Model | Sizes |
|-----------------------------------------------|-----------|
| [Llama3.1](https://llama.meta.com/docs/model-cards-and-prompt-formats/llama3_1) | 8B, 70B [[models](torchtune/models/llama3_1/_model_builders.py), [configs](recipes/configs/llama3_1/)] |
| [Llama3](https://llama.meta.com/llama3) | 8B, 70B [[models](torchtune/models/llama3/_model_builders.py), [configs](recipes/configs/llama3/)] |
| [Llama2](https://llama.meta.com/llama2/) | 7B, 13B, 70B [[models](torchtune/models/llama2/_model_builders.py), [configs](recipes/configs/llama2/)] |
| [Code-Llama2](https://ai.meta.com/blog/code-llama-large-language-model-coding/) | 7B, 13B, 70B [[model](torchtune/models/code_llama2/_model_builders.py), [configs](recipes/configs/code_llama2/)] |
| [Mistral](https://huggingface.co/mistralai) | 7B [[model](torchtune/models/mistral/_model_builders.py), [configs](recipes/configs/mistral/)] |
| [Gemma](https://huggingface.co/collections/google/gemma-release-65d5efbccdbb8c4202ec078b) | 2B, 7B [[model](torchtune/models/gemma/_model_builders.py), [configs](recipes/configs/gemma/)] |
| [Microsoft Phi3](https://huggingface.co/collections/microsoft/phi-3-6626e15e9585a200d2d761e3) | Mini [[model](torchtune/models/phi3/), [configs](recipes/configs/phi3/)]

We'll be adding a number of new models in the coming weeks, including support for 70B versions and MoEs.
We're always adding new models, but feel free to [file an Issue](https://github.com/pytorch/torchtune/issues/new) if there's a new one you would love to see in torchtune!

 

Expand Down Expand Up @@ -91,12 +88,12 @@ This table captures the peak memory usage and training speed for recipes in torc

 

## Llama3
## Llama3 and Llama3.1

torchtune supports fine-tuning for the Llama3 8B and 70B size models. We currently support LoRA, QLoRA and full fine-tune on a single GPU as well as LoRA and full fine-tune on multiple devices for the 8B model, and LoRA on multiple devices for the 70B model. For all the details, take a look at our [tutorial](https://pytorch.org/torchtune/main/tutorials/llama3.html).

**Note**: our Llama3 LoRA and QLoRA configs default to the instruct fine-tuned models.
This is because not all special token embeddings are initialized in the base 8B and 70B models.
> [!NOTE]
> Our Llama3 and Llama3.1 LoRA and QLoRA configs default to the instruct fine-tuned models. This is because not all special token embeddings are initialized in the base 8B and 70B models.
In our initial experiments for Llama3-8B, QLoRA has a peak allocated memory of ``~9GB`` while LoRA on a single GPU has a peak allocated memory of ``~19GB``. To get started, you can use our default configs to kick off training.

Expand All @@ -105,49 +102,49 @@ In our initial experiments for Llama3-8B, QLoRA has a peak allocated memory of `
LoRA 8B

```bash
tune run lora_finetune_single_device --config llama3/8B_lora_single_device
tune run lora_finetune_single_device --config llama3_1/8B_lora_single_device
```

QLoRA 8B

```bash
tune run lora_finetune_single_device --config llama3/8B_qlora_single_device
tune run lora_finetune_single_device --config llama3_1/8B_qlora_single_device
```

Full 8B

```bash
tune run full_finetune_single_device --config llama3/8B_full_single_device
tune run full_finetune_single_device --config llama3_1/8B_full_single_device
```

### Multi GPU

Full 8B

```bash
tune run --nproc_per_node 4 full_finetune_distributed --config llama3/8B_full
tune run --nproc_per_node 4 full_finetune_distributed --config llama3_1/8B_full
```

LoRA 8B

```bash
tune run --nproc_per_node 2 lora_finetune_distributed --config llama3/8B_lora
tune run --nproc_per_node 2 lora_finetune_distributed --config llama3_1/8B_lora
```

LoRA 70B

Note that the download command for the Meta-Llama3 70B model slightly differs from download commands for the 8B models. This is because we use the HuggingFace [safetensor](https://huggingface.co/docs/safetensors/en/index) model format to load the model. To download the 70B model, run
```bash
tune download meta-llama/Meta-Llama-3-70b --hf-token <> --output-dir /tmp/Meta-Llama-3-70b --ignore-patterns "original/consolidated*"
tune download meta-llama/Meta-Llama-3.1-70b --hf-token <> --output-dir /tmp/Meta-Llama-3.1-70b --ignore-patterns "original/consolidated*"
```

Then, a finetune can be kicked off:

```bash
tune run --nproc_per_node 8 lora_finetune_distributed --config recipes/configs/llama3/70B_lora.yaml
tune run --nproc_per_node 8 lora_finetune_distributed --config llama3_1/70B_lora.yaml
```

You can find a full list of all our Llama3 configs [here.](recipes/configs/llama3)
You can find a full list of all our Llama3 configs [here](recipes/configs/llama3) and Llama3.1 configs [here.](recipes/configs/llama3_1)


&nbsp;
Expand Down Expand Up @@ -199,12 +196,6 @@ To get started with fine-tuning your first LLM with torchtune, see our tutorial

Follow the instructions on the official [`meta-llama`](https://huggingface.co/meta-llama) repository to ensure you have access to the official Llama model weights. Once you have confirmed access, you can run the following command to download the weights to your local machine. This will also download the tokenizer model and a responsible use guide.

### Llama2 download
```bash
tune download meta-llama/Llama-2-7b-hf \
--output-dir /tmp/Llama-2-7b-hf \
--hf-token <HF_TOKEN> \
```

### Llama3 download
```bash
Expand All @@ -213,28 +204,28 @@ tune download meta-llama/Meta-Llama-3-8B \
--hf-token <HF_TOKEN> \
```


> Tip: Set your environment variable `HF_TOKEN` or pass in `--hf-token` to the command in order to validate your access.
You can find your token at https://huggingface.co/settings/tokens
> [!Tip]
> Set your environment variable `HF_TOKEN` or pass in `--hf-token` to the command in order to validate your access. You can find your token at https://huggingface.co/settings/tokens
&nbsp;

### Running fine-tuning recipes

Llama2 7B + LoRA on single GPU:
Llama3 8B + LoRA on single GPU:

```bash
tune run lora_finetune_single_device --config llama2/7B_lora_single_device
```

For distributed training, tune CLI integrates with [torchrun](https://pytorch.org/docs/stable/elastic/run.html).
Llama2 7B + LoRA on two GPUs:
Llama3 8B + LoRA on two GPUs:

```bash
tune run --nproc_per_node 2 full_finetune_distributed --config llama2/7B_full
```

> Tip: Make sure to place any torchrun commands **before** the recipe specification. Any CLI args after this will override the config and not impact distributed training.
> [!Tip]
> Make sure to place any torchrun commands **before** the recipe specification. Any CLI args after this will override the config and not impact distributed training.
&nbsp;

Expand Down
24 changes: 20 additions & 4 deletions docs/source/api_ref_models.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@ torchtune.models

.. currentmodule:: torchtune.models

llama3
------
llama3 & llama3.1
-----------------

All models from the `Llama3 family <https://llama.meta.com/llama3/>`_.

Expand All @@ -21,9 +21,10 @@ To download the Llama3-70B-Instruct model:

.. code-block:: bash
tune download meta-llama/Meta-Llama-3-70B-Instruct --hf-token <HF_TOKEN>
--ignore-patterns "original/consolidated*"
tune download meta-llama/Meta-Llama-3-70B-Instruct --hf-token <HF_TOKEN> --ignore-patterns "original/consolidated*"
To download the Llama3.1 weights of the above models, you can instead download from `Meta-Llama-3.1-8B-Instruct`
or `Meta-Llama-3.1-70B-Instruct`.

.. autosummary::
:toctree: generated/
Expand All @@ -40,6 +41,21 @@ To download the Llama3-70B-Instruct model:
llama3.llama3_tokenizer
llama3.Llama3Tokenizer

|
llama3_1.llama3_1
llama3_1.lora_llama3_1
llama3_1.llama3_1_8b
llama3_1.lora_llama3_1_8b
llama3_1.qlora_llama3_1_8b
llama3_1.llama3_1_70b
llama3_1.lora_llama3_1_70b
llama3_1.qlora_llama3_1_70b


.. note::

The Llama3.1 tokenizer reuses the `llama3.llama3_tokenizer` builder class.

llama2
------
Expand Down
1 change: 1 addition & 0 deletions recipes/configs/generation.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ chat_format: null
max_new_tokens: 300
temperature: 0.6 # 0.8 and 0.6 are popular values to try
top_k: 300
# It is recommended to set enable_kv_cache=False for long-context models like Llama3.1
enable_kv_cache: True

quantizer: null
109 changes: 109 additions & 0 deletions recipes/configs/llama3_1/70B_full.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
# Config for multi-device full finetuning in full_finetune_distributed.py
# using a Llama3.1 70B Instruct model
#
# This config assumes that you've run the following command before launching
# this run:
# tune download meta-llama/Meta-Llama-3.1-70B-Instruct --output-dir /tmp/Meta-Llama-3.1-70B-Instruct --ignore-patterns "original/consolidated*"
#
# To launch on 8 devices, run the following command from root:
# tune run --nproc_per_node 8 full_finetune_distributed --config llama3_1/70B_full
#
# You can add specific overrides through the command line. For example
# to override the checkpointer directory while launching training
# you can run:
# tune run --nproc_per_node 8 full_finetune_distributed --config llama3_1/70B_full checkpointer.checkpoint_dir=<YOUR_CHECKPOINT_DIR>
#
# This config is only tested on an 8xA100 machine.


# Tokenizer
tokenizer:
_component_: torchtune.models.llama3.llama3_tokenizer
path: /tmp/Meta-Llama-3.1-70B-Instruct/original/tokenizer.model

# Dataset
dataset:
_component_: torchtune.datasets.alpaca_dataset
seed: null
shuffle: True

# Model Arguments
model:
_component_: torchtune.models.llama3_1.llama3_1_70b

checkpointer:
_component_: torchtune.utils.FullModelHFCheckpointer
checkpoint_dir: /tmp/Meta-Llama-3.1-70B-Instruct/
checkpoint_files: [
model-00001-of-00030.safetensors,
model-00002-of-00030.safetensors,
model-00003-of-00030.safetensors,
model-00004-of-00030.safetensors,
model-00005-of-00030.safetensors,
model-00006-of-00030.safetensors,
model-00007-of-00030.safetensors,
model-00008-of-00030.safetensors,
model-00009-of-00030.safetensors,
model-00010-of-00030.safetensors,
model-00011-of-00030.safetensors,
model-00012-of-00030.safetensors,
model-00013-of-00030.safetensors,
model-00014-of-00030.safetensors,
model-00015-of-00030.safetensors,
model-00016-of-00030.safetensors,
model-00017-of-00030.safetensors,
model-00018-of-00030.safetensors,
model-00019-of-00030.safetensors,
model-00020-of-00030.safetensors,
model-00021-of-00030.safetensors,
model-00022-of-00030.safetensors,
model-00023-of-00030.safetensors,
model-00024-of-00030.safetensors,
model-00025-of-00030.safetensors,
model-00026-of-00030.safetensors,
model-00027-of-00030.safetensors,
model-00028-of-00030.safetensors,
model-00029-of-00030.safetensors,
model-00030-of-00030.safetensors,
]
recipe_checkpoint: null
output_dir: /tmp/Meta-Llama-3.1-70B-Instruct/
model_type: LLAMA3
resume_from_checkpoint: False

# Fine-tuning arguments
batch_size: 2
epochs: 3

optimizer:
_component_: torch.optim.AdamW
lr: 2e-5
foreach: False
# Note: highly recommended to use fused=True optimizer flag
# with CPU offload for faster optimizer step.
fused: True

loss:
_component_: torch.nn.CrossEntropyLoss
max_steps_per_epoch: null
gradient_accumulation_steps: 1


# Training env
device: cuda

# Memory management
enable_activation_checkpointing: True
memory_efficient_fsdp_wrap: True
fsdp_cpu_offload: True

# Reduced precision
dtype: bf16

# Logging
metric_logger:
_component_: torchtune.utils.metric_logging.DiskLogger
log_dir: ${output_dir}
output_dir: /tmp/alpaca-llama3-finetune
log_every_n_steps: 1
log_peak_memory_stats: False
Loading

0 comments on commit 403c7f3

Please sign in to comment.