Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for multi gpu setup training with a single GPU. #974

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions unsloth/tokenizer_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -1091,12 +1091,20 @@ def add_new_tokens(


def check_nvidia():
index_for_cuda = -1
if "CUDA_VISIBLE_DEVICES" in os.environ:
index_for_cuda = os.environ["CUDA_VISIBLE_DEVICES"]
Comment on lines +1094 to +1096
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
index_for_cuda = -1
if "CUDA_VISIBLE_DEVICES" in os.environ:
index_for_cuda = os.environ["CUDA_VISIBLE_DEVICES"]
index_for_cuda = os.environ.get("CUDA_VISIBLE_DEVICES", -1)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if CUDA_VISIBLE_DEVICES="0,1,2"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The next few lines would take care of that I suppose.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, sorry

if "," in index_for_cuda:
raise RuntimeError("Unsloth currently does not support multi GPU setups - but we are working on it!")
index_for_cuda = int(index_for_cuda)
# Unsloth doesn't work yet on AMD devices - we're working on it!
output = np.array([0,])
try:
output = subprocess.check_output("nvidia-smi --query-gpu=memory.used --format=csv", shell = True)
output = re.findall(rb'([\d]{1,})[\s]{1,}M', output)
output = np.array([int(x.decode('utf-8'))/1024 for x in output])
if index_for_cuda != -1:
output = np.array([output[index_for_cuda],])
except:
if not torch.cuda.is_available():
raise RuntimeError("Unsloth: We do not support AMD / Intel machines yet - it is a work in progress!")
Expand Down Expand Up @@ -1160,11 +1168,20 @@ def patch_sft_trainer_tokenizer():
" )\n"\
"pass\n"\
"import subprocess, re, gc, numpy as np\n"\
"import os\n"\
"index_for_cuda = -1\n"\
"if \"CUDA_VISIBLE_DEVICES\" in os.environ:\n"\
" index_for_cuda = os.environ[\"CUDA_VISIBLE_DEVICES\"]\n"\
" if \",\" in index_for_cuda:\n"\
" raise RuntimeError(\"Unsloth currently does not support multi GPU setups - but we are working on it!\")\n"\
" index_for_cuda = int(index_for_cuda)\n"\
"a = np.array([0,])\n"\
"try:\n"\
" a = subprocess.check_output('nvidia-smi --query-gpu=memory.used --format=csv', shell = True)\n"\
" a = re.findall(rb'([\\d]{1,})[\\s]{1,}M', a)\n"\
" a = np.array([int(x.decode('utf-8'))/1024 for x in a])\n"\
" if index_for_cuda != -1:\n"\
" a = np.array([a[index_for_cuda],])\n"\
"except:\n"\
" if not torch.cuda.is_available():\n"\
" raise RuntimeError('Unsloth: We do not support AMD / Intel machines yet - it is a work in progress!')\n"\
Expand Down