You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
i was using fsdp + qlora fine tuning llama3 70B on 8* A100 80G, and i encountered this error:
Traceback (most recent call last):
File "/mnt/209180/qishi/project/alignment-handbook/scripts/run_sft.py", line 233, in<module>main()
File "/mnt/209180/qishi/project/alignment-handbook/scripts/run_sft.py", line 188, in main
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/root/anaconda3/envs/handbook/lib/python3.10/site-packages/trl/trainer/sft_trainer.py", line 361, in train
output = super().train(*args, **kwargs)
File "/root/anaconda3/envs/handbook/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train
return inner_training_loop(
File "/root/anaconda3/envs/handbook/lib/python3.10/site-packages/transformers/trainer.py", line 2002, in _inner_training_loop
self.model = self.accelerator.prepare(self.model)
File "/root/anaconda3/envs/handbook/lib/python3.10/site-packages/accelerate/accelerator.py", line 1292, in prepare
result = tuple(
File "/root/anaconda3/envs/handbook/lib/python3.10/site-packages/accelerate/accelerator.py", line 1293, in<genexpr>
self._prepare_one(obj, first_pass=True, device_placement=d) forobj, din zip(args, device_placement)
File "/root/anaconda3/envs/handbook/lib/python3.10/site-packages/accelerate/accelerator.py", line 1169, in _prepare_one
return self.prepare_model(obj, device_placement=device_placement)
File "/root/anaconda3/envs/handbook/lib/python3.10/site-packages/accelerate/accelerator.py", line 1459, in prepare_model
model = FSDP(model, **kwargs)
File "/root/anaconda3/envs/handbook/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 463, in __init__
_auto_wrap(
File "/root/anaconda3/envs/handbook/lib/python3.10/site-packages/torch/distributed/fsdp/_wrap_utils.py", line 101, in _auto_wrap
_recursive_wrap(**recursive_wrap_kwargs, **root_kwargs) # type: ignore[arg-type]
File "/root/anaconda3/envs/handbook/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py", line 537, in _recursive_wrap
wrapped_child, num_wrapped_params = _recursive_wrap(
File "/root/anaconda3/envs/handbook/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py", line 537, in _recursive_wrap
wrapped_child, num_wrapped_params = _recursive_wrap(
File "/root/anaconda3/envs/handbook/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py", line 537, in _recursive_wrap
wrapped_child, num_wrapped_params = _recursive_wrap(
[Previous line repeated 2 more times]
File "/root/anaconda3/envs/handbook/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py", line 555, in _recursive_wrap
return _wrap(module, wrapper_cls, **kwargs), nonwrapped_numel
File "/root/anaconda3/envs/handbook/lib/python3.10/site-packages/torch/distributed/fsdp/wrap.py", line 484, in _wrap
return wrapper_cls(module, **kwargs)
File "/root/anaconda3/envs/handbook/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 487, in __init__
_init_param_handle_from_module(
File "/root/anaconda3/envs/handbook/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py", line 519, in _init_param_handle_from_module
_init_param_handle_from_params(state, managed_params, fully_sharded_module)
File "/root/anaconda3/envs/handbook/lib/python3.10/site-packages/torch/distributed/fsdp/_init_utils.py", line 531, in _init_param_handle_from_params
handle = FlatParamHandle(
File "/root/anaconda3/envs/handbook/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 537, in __init__
self._init_flat_param_and_metadata(
File "/root/anaconda3/envs/handbook/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 585, in _init_flat_param_and_metadata
) = self._validate_tensors_to_flatten(params)
File "/root/anaconda3/envs/handbook/lib/python3.10/site-packages/torch/distributed/fsdp/flat_param.py", line 720, in _validate_tensors_to_flatten
raise ValueError("Cannot flatten integer dtype tensors")
ValueError: Cannot flatten integer dtype tensors
Thank you guys for your work!
i was using fsdp + qlora fine tuning llama3 70B on 8* A100 80G, and i encountered this error:
my config :
My pip list:
The text was updated successfully, but these errors were encountered: