feat: sort cuda graphs in descending order #2104

drbh · 2024-06-21T16:15:56Z

sorting cuda graphs in descending order uses slightly less memory during initialization

using the following command on 1 A10G

text-generation-launcher \
--model-id meta-llama/Llama-2-7b-chat-hf \
--num-shard 1 \
--cuda-graphs 2,4,8,16,32,64,128,256,512,1024,2048

without cuda graphs sorted 22140/23028 MiB are used and with them sorted 21938/23028 MiB are used.

This is a small percent of total memory (<1%) however it's 202 MiB saved. Additionally looking at the memory usage overtime, loading in descending order has smaller spike (never goes above its final value, where in ascending it peaks higher than 22222 MiB)

Below are the two memory recording with https://github.com/drbh/nvline, left is ascending, and right is descending.

drbh · 2024-06-21T18:28:23Z

@danieldk thanks for suggesting this and reviewing!

feat: sort cuda graphs in descending order

98e9be7

danieldk approved these changes Jun 21, 2024

View reviewed changes

drbh merged commit 811a938 into main Jun 21, 2024
6 checks passed

drbh deleted the descending-cuda-graphs branch June 21, 2024 18:28

yuanwu2017 pushed a commit to yuanwu2017/tgi-gaudi that referenced this pull request Sep 26, 2024

feat: sort cuda graphs in descending order (huggingface#2104)

d930724

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: sort cuda graphs in descending order #2104

feat: sort cuda graphs in descending order #2104

drbh commented Jun 21, 2024 •

edited

Loading

drbh commented Jun 21, 2024

feat: sort cuda graphs in descending order #2104

feat: sort cuda graphs in descending order #2104

Conversation

drbh commented Jun 21, 2024 • edited Loading

drbh commented Jun 21, 2024

drbh commented Jun 21, 2024 •

edited

Loading