Mllama flash version #2585

Narsil · 2024-09-30T08:46:46Z

What does this PR do?

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

HuggingFaceDocBuilderDev · 2024-09-30T11:43:06Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

drbh · 2024-10-01T10:24:45Z

server/text_generation_server/models/custom_modeling/mllama.py

+        self.layernorm_pre = nn.LayerNorm.load(
+            prefix=f"{prefix}.layernorm_pre",
+            weights=weights,
+            # torch default
+            eps=1e-05,
+        )
+        self.layernorm_post = nn.LayerNorm.load(
+            prefix=f"{prefix}.layernorm_post",
+            weights=weights,
+            # torch default
+            eps=1e-05,
+        )


maybe we can use FastLayerNorms in place of the native LayerNorm?

I had more divergence than without, same for rotary.
In any case the vision heads have minimal overhead (compared to the decode).

Given we have pixel values variance (PIL vs Rust image loader).

drbh · 2024-10-01T10:29:58Z

server/text_generation_server/models/idefics.py

+        if config.model_type == "idefics":
+            model = IdeficsForVisionText2Text(config, weights)
+        elif config.model_type == "mllama":
+            model = MllamaForConditionalGeneration(
+                prefix="", config=config, weights=weights
+            )
+        else:
+            raise RuntimeError(f"Unsupported model type {config.model_type}")


not sure if we want to update the name of this class from IDEFICSSharded to something like VLMShared since it seems that mllama will use this path too

No it doesn't this is old code that needs to be removed, I just fused idefics.py and idefics_causal_lm.py.

drbh · 2024-10-01T10:37:48Z

server/text_generation_server/models/custom_modeling/mllama.py

+
+
+# Copied from transformers.models.llama.modeling_llama.apply_rotary_pos_emb
+def apply_rotary_pos_emb(q, k, cos, sin, position_ids=None, unsqueeze_dim=1):


can we remove this? I'm cant seem to find where its called

drbh · 2024-10-01T10:38:07Z

server/text_generation_server/models/custom_modeling/mllama.py

+
+
+# Copied from transformers.models.llama.modeling_llama.rotate_half
+def rotate_half(x):


similar to above, only used in apply_rotary_pos_emb

drbh · 2024-10-01T10:38:26Z

server/text_generation_server/models/custom_modeling/mllama.py

+
+
+# Copied from transformers.models.llama.modeling_llama.repeat_kv
+def repeat_kv(hidden_states: torch.Tensor, n_rep: int) -> torch.Tensor:


similar to above

Good point, removed them

to be instability after (meaning size of the batch matters.

* Working loading state. * Preprocessing. * Working state ? (Broke idefics1 temporarily). * Cleaner condition. * Fix idefics. * Updating config, removing TODO * Mllama * Ugrade transformers 4.45 * Flashing mllama. * Starting to get there. * Working state. * Integrations tests for mllama (cutting to 10 tokens because there seems' to be instability after (meaning size of the batch matters. * Updating model link. * Earlier assert. * Fix vlm ? * remove log. * Force ignore all images but last. * Default dtype bfloat16. * Update integration test after switch to bf16. * Remove dead code. * Removed dead code. * Upgrade the flake to latest transformers/tokenizers * Move to hf tgi-nix * Upgrade to 0.5.0

Narsil force-pushed the mllama_flash branch 2 times, most recently from d407659 to 1f52f1c Compare September 30, 2024 11:41

tgaddair mentioned this pull request Sep 30, 2024

Flash mllama predibase/lorax#622

Merged

Narsil mentioned this pull request Oct 1, 2024

Mllama #2568

Closed

5 tasks

drbh reviewed Oct 1, 2024

View reviewed changes

elliottlawrence mentioned this pull request Oct 2, 2024

Add support for Llama 3.2 vision / Mllama #2598

Closed

Narsil added 19 commits October 2, 2024 10:33

Working loading state.

79b9df8

Preprocessing.

a9c278a

Working state ? (Broke idefics1 temporarily).

2ab02dc

Cleaner condition.

e55067a

Fix idefics.

fa02c03

Updating config, removing TODO

1014a06

Mllama

fc26734

Ugrade transformers 4.45

2441142

Flashing mllama.

8577198

Starting to get there.

ef4fa3e

Working state.

2ac607a

Integrations tests for mllama (cutting to 10 tokens because there seems'

af677ca

to be instability after (meaning size of the batch matters.

Updating model link.

933060c

Earlier assert.

d9fecec

Fix vlm ?

e5476dc

remove log.

265715a

Force ignore all images but last.

7ede61b

Default dtype bfloat16.

d735e46

Update integration test after switch to bf16.

f58195d

Narsil added 3 commits October 2, 2024 10:33

Remove dead code.

29813e2

Removed dead code.

9e658fb

Upgrade the flake to latest transformers/tokenizers

1ee36dc

Narsil force-pushed the mllama_flash branch from dfcaa75 to 1ee36dc Compare October 2, 2024 08:34

Narsil added 2 commits October 2, 2024 10:48

Move to hf tgi-nix

0437f88

Upgrade to 0.5.0

e164177

Narsil merged commit d18ed5c into main Oct 2, 2024
13 of 14 checks passed

Narsil deleted the mllama_flash branch October 2, 2024 09:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mllama flash version #2585

Mllama flash version #2585

Narsil commented Sep 30, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Sep 30, 2024

drbh Oct 1, 2024

Narsil Oct 1, 2024

drbh Oct 1, 2024

Narsil Oct 1, 2024

drbh Oct 1, 2024

drbh Oct 1, 2024

drbh Oct 1, 2024

Narsil Oct 1, 2024



		# Copied from transformers.models.llama.modeling_llama.apply_rotary_pos_emb
		def apply_rotary_pos_emb(q, k, cos, sin, position_ids=None, unsqueeze_dim=1):



		# Copied from transformers.models.llama.modeling_llama.rotate_half
		def rotate_half(x):



		# Copied from transformers.models.llama.modeling_llama.repeat_kv
		def repeat_kv(hidden_states: torch.Tensor, n_rep: int) -> torch.Tensor:

Mllama flash version #2585

Mllama flash version #2585

Conversation

Narsil commented Sep 30, 2024 • edited Loading

What does this PR do?

Before submitting

Who can review?

HuggingFaceDocBuilderDev commented Sep 30, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Narsil commented Sep 30, 2024 •

edited

Loading