Update tensor_parallel.py #2798

Lacacy · 2024-12-03T11:14:59Z

Resolve the issue of abnormal conversation performance in the Baichuan large model.

Fix the bug in the norm_head adaptation for Baichuan.

https://huggingface.co/baichuan-inc/Baichuan2-13B-Chat/blob/main/modeling_baichuan.py#:~:text=self.weight.data%20%3D%20nn.functional.normalize(self.weight)

@OlivierDehaene OR @Narsil

Resolve the issue of abnormal conversation performance in the Baichuan large model.

Narsil · 2024-12-11T07:50:03Z

We cannot really accept this.
This is a bug in Baichuan weights, not in our code.

The issue with your proposed fix is that we support tensor parallelism (TP), which means weight values will depend on what TP value you're using, leading to potentially even more massive discrepancies.
The "true" fix in that sense would be to load the entire weight, normalize it, and then split it across GPU, but it will lead to other issues, the first of which will be excess of VRAM usage, which can cause unwanted OOMs.
Baichuan should fix their weights (unless there's a valid reason to keep the unnormalized weights, but I don't think there's one).

Update tensor_parallel.py

731f890

Resolve the issue of abnormal conversation performance in the Baichuan large model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update tensor_parallel.py #2798

Update tensor_parallel.py #2798

Lacacy commented Dec 3, 2024

Narsil commented Dec 11, 2024

Update tensor_parallel.py #2798

Are you sure you want to change the base?

Update tensor_parallel.py #2798

Conversation

Lacacy commented Dec 3, 2024

Fix the bug in the norm_head adaptation for Baichuan.

Narsil commented Dec 11, 2024