New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Make autopacking faster #1435

Merged

b-chu merged 1 commit into main from autopacking

Aug 8, 2024

Contributor

b-chu commented Aug 7, 2024 •

edited

Loading

Adds a profiling flag to collators to avoid unnecessary padding.

Seq2Seq collator pads to the length of the longest processed example instead of max_seq_len
The packing collator doesn't concatenate tensors when profiling
The packing collator returns early instead of repadding the freshly concatenated tensors to max_seq_len

Tested with reference runs:

autopacking-oom-mpt-baseline-Pr7MgS

global_train_batch_size: 256
device_train_microbatch_size: 16
num_gpus: 8
max_seq_len: 131072
packing ratio: 1310.7
PR packing ratio: 1310.7

autopacking-oom-mpt-baseline-CPadi0

global_train_batch_size: 8
device_train_microbatch_size: 1
num_gpus: 8
max_seq_len: 32768
packing ratio: 190.1
PR packing ratio: 190.1
global_train_batch_size: 64
device_train_microbatch_size: 1
num_gpus: 64
max_seq_len: 131072
PR packing ratio: 897.1 autopacking-oom-mpt-fix-mcV61Z

b-chu requested a review from irenedea

August 7, 2024 03:09

irenedea reviewed

View reviewed changes

llmfoundry/data/finetuning/collator.py Outdated Show resolved Hide resolved

irenedea reviewed

View reviewed changes

llmfoundry/data/finetuning/collator.py Outdated Show resolved Hide resolved

irenedea reviewed

View reviewed changes

llmfoundry/data/packing.py Outdated Show resolved Hide resolved

irenedea reviewed

View reviewed changes

llmfoundry/data/packing.py Outdated Show resolved Hide resolved

irenedea reviewed

View reviewed changes

llmfoundry/data/packing.py Show resolved Hide resolved

irenedea reviewed

View reviewed changes

llmfoundry/data/packing.py Outdated Show resolved Hide resolved

b-chu force-pushed the autopacking branch 2 times, most recently from 7dbca82 to a3c61e1 Compare

August 8, 2024 15:00

b-chu marked this pull request as ready for review

August 8, 2024 15:01

b-chu requested a review from a team as a code owner

August 8, 2024 15:01

b-chu force-pushed the autopacking branch 2 times, most recently from 3e53b0e to 3800f98 Compare

August 8, 2024 15:31

b-chu requested a review from irenedea

August 8, 2024 15:31

b-chu force-pushed the autopacking branch 2 times, most recently from 385879c to 2d5f95d Compare

August 8, 2024 17:10

irenedea reviewed

View reviewed changes

llmfoundry/data/finetuning/collator.py Show resolved Hide resolved

irenedea reviewed

View reviewed changes

llmfoundry/data/packing.py Outdated Show resolved Hide resolved

b-chu force-pushed the autopacking branch from 2d5f95d to 58790d8 Compare

August 8, 2024 19:58

irenedea reviewed

View reviewed changes

llmfoundry/data/packing.py Show resolved Hide resolved

llmfoundry/data/packing.py Outdated Show resolved Hide resolved

llmfoundry/data/packing.py Show resolved Hide resolved

llmfoundry/data/packing.py Show resolved Hide resolved

llmfoundry/data/finetuning/collator.py Show resolved Hide resolved

llmfoundry/data/finetuning/collator.py Show resolved Hide resolved

b-chu force-pushed the autopacking branch 3 times, most recently from 87ffa12 to 4fca37f Compare

August 8, 2024 20:15

irenedea approved these changes

View reviewed changes

Contributor

irenedea left a comment

lgtm! just remove print statements :)

b-chu force-pushed the autopacking branch 2 times, most recently from 7a288e8 to 7644a35 Compare

August 8, 2024 21:11

b-chu enabled auto-merge (squash)

August 8, 2024 21:15

irenedea reviewed

View reviewed changes

llmfoundry/data/packing.py Show resolved Hide resolved


          Add test logging

ca2b2bb

b-chu force-pushed the autopacking branch from 7644a35 to ca2b2bb Compare

August 8, 2024 21:26

b-chu merged commit 44b09f0 into main

9 checks passed

b-chu deleted the autopacking branch

August 9, 2024 15:42

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet