Attempt for cleverer auto batch_prefill values (some simplifications). #2808

Narsil · 2024-12-08T11:37:23Z

What does this PR do?

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

launcher/src/main.rs

danieldk · 2024-12-09T07:27:32Z

launcher/src/main.rs

+        // TODO This calculation depends on the actual implementation
+        let dtype_size = 2;
+        let mlp_size = self.intermediate_size?;
+        Some((mlp_size + mlp_size / 2) * self.num_experts * dtype_size * 3)


Maybe a comment where mlp_size / 2 comes from?

danieldk · 2024-12-09T07:30:52Z

launcher/src/main.rs

+enum Gpu {
+    RTX4090,
+    T4,
+    L4,


We probably want L40 as well, since we are using them now?

danieldk · 2024-12-09T07:37:50Z

integration-tests/models/test_flash_llama_prefix_flashdecoding.py

+    # switch on equivalent logits based on the position in the batch.
+    # 1 output being different is not uncommon
+    if sum(equals) < len(equals) - 1:
+        assert outputs == expected


I'm missing something. Can't this only be true when sum(equals) == len(equals)?

I want the error message to be about the content and containing the diff.

Co-authored-by: Daniël de Kok <[email protected]>

drbh

lgtm!

also the qwen2-vl tests should be resolved shortly in this PR #2802

Narsil added 3 commits December 8, 2024 12:36

Attempt for cleverer auto batch_prefill values (some simplifications).

037ea55

Less flaky tests.

a0003a6

Fixing typo insertion.

5b04d6c

Narsil requested review from danieldk and drbh December 9, 2024 04:03

danieldk reviewed Dec 9, 2024

View reviewed changes

Narsil and others added 4 commits December 9, 2024 10:41

Update launcher/src/main.rs

36ed43c

Co-authored-by: Daniël de Kok <[email protected]>

Adding small comment for source of calculation.

d701f9e

Adding L40.

908dec6

Adding L40s.

14d1973

drbh approved these changes Dec 9, 2024

View reviewed changes

Narsil merged commit a04356f into main Dec 9, 2024
10 of 12 checks passed

Narsil deleted the update_max_prefill_auto_with_vram_reqs branch December 9, 2024 18:44

2016bgeyer mentioned this pull request Dec 20, 2024

Can't run llama3.1-70b at full context #2301

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attempt for cleverer auto batch_prefill values (some simplifications). #2808

Attempt for cleverer auto batch_prefill values (some simplifications). #2808

Narsil commented Dec 8, 2024

danieldk Dec 9, 2024

Narsil Dec 9, 2024

danieldk Dec 9, 2024

Narsil Dec 9, 2024

danieldk Dec 9, 2024

Narsil Dec 9, 2024

drbh left a comment

Attempt for cleverer auto batch_prefill values (some simplifications). #2808

Attempt for cleverer auto batch_prefill values (some simplifications). #2808

Conversation

Narsil commented Dec 8, 2024

What does this PR do?

Before submitting

Who can review?

danieldk Dec 9, 2024

Choose a reason for hiding this comment

Narsil Dec 9, 2024

Choose a reason for hiding this comment

danieldk Dec 9, 2024

Choose a reason for hiding this comment

Narsil Dec 9, 2024

Choose a reason for hiding this comment

danieldk Dec 9, 2024

Choose a reason for hiding this comment

Narsil Dec 9, 2024

Choose a reason for hiding this comment

drbh left a comment

Choose a reason for hiding this comment