Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat] OLMoE hf converter #61

Open
Tracked by #66
tscholak opened this issue Nov 22, 2024 · 0 comments
Open
Tracked by #66

[feat] OLMoE hf converter #61

tscholak opened this issue Nov 22, 2024 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@tscholak
Copy link
Collaborator

tscholak commented Nov 22, 2024

🧐 Problem Description

Fast-LLM doesn't yet support importing or exporting OLMoE models such as https://huggingface.co/allenai/OLMoE-1B-7B-0924.

💡 Proposed Solution

Add an OLMoE HF converter that offers both expert and import functionality:

  1. Make it possible to export a Fast-LLM OLMoE-like model to HF's OlmoeForCausalLM format (see https://github.com/huggingface/transformers/blob/main/src/transformers/models/olmoe/modeling_olmoe.py).

  2. Load HF OLMoE models into Fast-LLM.

  3. Verify the equivalence of model weights and outputs post-conversion. Something to look out for are discrepancies between the order of FFN, LayerNorm, and Dropout layers in Fast-LLM's GPT and OLMoE, i.e.

    def forward(self, input_: torch.Tensor, kwargs: dict, losses: dict | None = None, metrics: dict | None = None):
    vs. https://github.com/huggingface/transformers/blob/54be2d7ae87e873482b984cc956e165ca4dc0ba3/src/transformers/models/olmoe/modeling_olmoe.py#L688

🔄 Alternatives Considered

It might be possible to export OLMoE-like models in HF Mixtral format.

📈 Potential Benefits

Allows for:

  • Continual pretraining of existing OLMoE checkpoints from the HF Hub.
  • Benchmarking and deployment of OLMoE-like models trained with Fast-LLM.

📝 Additional Context

@tscholak tscholak added the enhancement New feature or request label Nov 22, 2024
@sohamparikh sohamparikh mentioned this issue Dec 3, 2024
25 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants