[feat] OLMoE hf converter #61

tscholak · 2024-11-22T13:12:45Z

🧐 Problem Description

Fast-LLM doesn't yet support importing or exporting OLMoE models such as https://huggingface.co/allenai/OLMoE-1B-7B-0924.

Add an OLMoE HF converter that offers both expert and import functionality:

Make it possible to export a Fast-LLM OLMoE-like model to HF's OlmoeForCausalLM format (see https://github.com/huggingface/transformers/blob/main/src/transformers/models/olmoe/modeling_olmoe.py).
Load HF OLMoE models into Fast-LLM.
Verify the equivalence of model weights and outputs post-conversion. Something to look out for are discrepancies between the order of FFN, LayerNorm, and Dropout layers in Fast-LLM's GPT and OLMoE, i.e.

Fast-LLM/fast_llm/layers/transformer/transformer.py

Line 83 in 436d8d2

def forward(self, input_: torch.Tensor, kwargs: dict, losses: dict | None = None, metrics: dict | None = None):

vs. https://github.com/huggingface/transformers/blob/54be2d7ae87e873482b984cc956e165ca4dc0ba3/src/transformers/models/olmoe/modeling_olmoe.py#L688

It might be possible to export OLMoE-like models in HF Mixtral format.

Allows for:

OLMoE model code: https://github.com/allenai/OLMo/blob/04a2da53db172bd9a0450705592ed50888bdcaa7/olmo/model.py#L674
PR clamping initialized weights #48 introduced clamping of initial weights, which was shown to improve training stability for OLMoE models.
Issue [bug] Sparse copy runs out of shared memory with many experts #56 describes a bug in Triton that prevents Fast-LLM to train OLMoE models with 64 experts and dropless MoE enabled.

The text was updated successfully, but these errors were encountered:

tscholak added the enhancement New feature or request label Nov 22, 2024

tscholak mentioned this issue Nov 25, 2024

[epic] OLMoE support #66

Open

tscholak assigned sohamparikh Nov 26, 2024

sohamparikh mentioned this issue Dec 3, 2024

olmoe HF conversion #84

Closed

25 tasks