Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speculative #1308

Merged
merged 38 commits into from
Dec 11, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
d7d07d4
Tmp work for medusa.
Narsil Sep 11, 2023
3fae84d
Tmp.
Narsil Sep 18, 2023
243e9c3
Speculative medusa (illegal address Paged).
Narsil Nov 28, 2023
aa442dc
Speedup 2x.
Narsil Nov 29, 2023
cda627e
Modifying the protobuf.
Narsil Nov 29, 2023
b4d97d5
Non breaking router.
Narsil Nov 30, 2023
657ccd8
Medusa + ngram
Narsil Dec 1, 2023
e7e0734
Working state except all params ??
Narsil Dec 1, 2023
7ed07bc
Speculative decoding + mistral
Narsil Dec 4, 2023
bdd9596
Propagate speculate
Narsil Dec 4, 2023
1f46bc4
Updating launcher + docs.
Narsil Dec 4, 2023
970e57b
Need to update params since tensor changed.
Narsil Dec 4, 2023
79f9afb
Needed to regenerate params tests + fix simple tests
Narsil Dec 4, 2023
d99f281
Remove pdb comments.
Narsil Dec 4, 2023
2697920
Cargo fmt
Narsil Dec 4, 2023
f576598
Revert falcon load modification.
Narsil Dec 4, 2023
8efff84
C'mon falcon.
Narsil Dec 4, 2023
4d6efe3
cargo update.
Narsil Dec 5, 2023
e808222
Working around falcon tests.
Narsil Dec 5, 2023
be481a4
Address comments.
Narsil Dec 5, 2023
cb8a168
Fix.
Narsil Dec 5, 2023
5aa3a01
Fmt.
Narsil Dec 5, 2023
09839b0
Fixing some simple stuff, adding `speculate` to budget.
Narsil Dec 5, 2023
9bf31fe
Fixing infer iterator.
Narsil Dec 5, 2023
fdef00c
Fix no speculation.
Narsil Dec 5, 2023
a3cc5a9
Cargo fmt.
Narsil Dec 5, 2023
7b34445
Improve create_n_gram degradation.
Narsil Dec 6, 2023
f6958ea
Include a few fixes
Narsil Dec 6, 2023
6350c11
Discard all params modifications, we're not running ngram speculation
Narsil Dec 6, 2023
b3c1492
Revert integration tests modifications.
Narsil Dec 6, 2023
3a8b192
Remove ngram debug code
Narsil Dec 6, 2023
d2b42f6
Updating medusa test + Speeding ngram immensely by just making a smple
Narsil Dec 6, 2023
3a79fbc
Updated.
Narsil Dec 6, 2023
abc8d48
Old llama test.
Narsil Dec 6, 2023
ba16994
Fixing medusa off by ones.
Narsil Dec 8, 2023
e95a5a8
Removing dead code.
Narsil Dec 8, 2023
b6519b5
Update medusa sampling.
Narsil Dec 9, 2023
0006fab
Apply suggestions from code review
Narsil Dec 11, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 33 additions & 33 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 8 additions & 0 deletions docs/source/basic_tutorials/launcher.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,14 @@ Options:
- bitsandbytes-nf4: Bitsandbytes 4bit. Can be applied on any model, will cut the memory requirement by 4x, but it is known that the model will be much slower to run than the native f16
- bitsandbytes-fp4: Bitsandbytes 4bit. nf4 should be preferred in most cases but maybe this one has better perplexity performance for you model

```
## SPECULATE
```shell
--speculate <SPECULATE>
The number of input_ids to speculate on If using a medusa model, the heads will be picked up automatically Other wise, it will use n-gram speculation which is relatively free in terms of compute, but the speedup heavily depends on the task

[env: SPECULATE=]

```
## DTYPE
```shell
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
{
"details": {
"best_of_sequences": null,
"finish_reason": "length",
"generated_tokens": 10,
"prefill": [
{
"id": 1,
"logprob": null,
"text": "<s>"
},
{
"id": 338,
"logprob": -10.0078125,
"text": "is"
},
{
"id": 21784,
"logprob": -15.515625,
"text": "Deep"
},
{
"id": 29257,
"logprob": -2.8847656,
"text": "Learning"
},
{
"id": 29973,
"logprob": -4.140625,
"text": "?"
}
],
"seed": 0,
"tokens": [
{
"id": 13,
"logprob": -1.1582031,
"special": false,
"text": "\n"
},
{
"id": 2772,
"logprob": -0.23083496,
"special": false,
"text": "De"
},
{
"id": 1022,
"logprob": 0.0,
"special": false,
"text": "ep"
},
{
"id": 6509,
"logprob": 0.0,
"special": false,
"text": " learning"
},
{
"id": 29892,
"logprob": -0.61816406,
"special": false,
"text": ","
},
{
"id": 607,
"logprob": -0.7089844,
"special": false,
"text": " which"
},
{
"id": 508,
"logprob": -1.7724609,
"special": false,
"text": " can"
},
{
"id": 367,
"logprob": 0.0,
"special": false,
"text": " be"
},
{
"id": 5545,
"logprob": 0.0,
"special": false,
"text": " considered"
},
{
"id": 408,
"logprob": -0.3869629,
"special": false,
"text": " as"
}
]
},
"generated_text": "What is Deep Learning?\nDeep learning, which can be considered as"
}
Loading
Loading