-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
This draft PR is a work in progress implementation of the mamba model. This PR currently loads weights, and produces correct logits after a single pass. This PR still needs to correctly integrate this model so it produces tokens as expected, and apply optimization to avoid all copies during runtime/unnecessary operations. #### Helpful resources [Mamba: Linear-Time Sequence Modeling with Selective State Spaces (Albert Gu and Tri Dao)](https://arxiv.org/abs/2312.00752) https://github.com/johnma2006/mamba-minimal https://github.com/huggingface/candle/blob/main/candle-examples/examples/mamba-minimal/model.rs huggingface/transformers#28094 Notes: this dev work is currently targeting `state-spaces/mamba-130m`, so if you want to test please use that model. Additionally when starting the router the prefill needs to be limited: `cargo run -- --max-batch-prefill-tokens 768 --max-input-length 768` ## Update / Current State Integration tests have been added and basic functionality such as model loading is supported. ```bash cd integration-tests pytest -vv models/test_fused_kernel_mamba.py ``` - [x] add tests - [x] load model - [x] make simple request - [ ] resolve warmup issue - [ ] resolve output issues fetching models tested during dev ```bash text-generation-server download-weights state-spaces/mamba-130m text-generation-server download-weights state-spaces/mamba-1.4b text-generation-server download-weights state-spaces/mamba-2.8b ``` The server can be run ```bash cd server MASTER_ADDR=127.0.0.1 MASTER_PORT=5555 python text_generation_server/cli.py serve state-spaces/mamba-2.8b ``` router ```bash cargo run ``` make a request ```bash curl -s localhost:3000/generate \ -X POST \ -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \ -H 'Content-Type: application/json' | jq ``` response ```json { "generated_text": "\n\nDeep learning is a machine learning technique that uses a deep neural network to learn from data." } ``` --------- Co-authored-by: Nicolas Patry <[email protected]>
- Loading branch information
Showing
11 changed files
with
1,547 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
73 changes: 73 additions & 0 deletions
73
integration-tests/models/__snapshots__/test_mamba/test_mamba.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
{ | ||
"details": { | ||
"best_of_sequences": null, | ||
"finish_reason": "length", | ||
"generated_tokens": 10, | ||
"prefill": [], | ||
"seed": null, | ||
"tokens": [ | ||
{ | ||
"id": 187, | ||
"logprob": -0.3552246, | ||
"special": false, | ||
"text": "\n" | ||
}, | ||
{ | ||
"id": 187, | ||
"logprob": -0.38378906, | ||
"special": false, | ||
"text": "\n" | ||
}, | ||
{ | ||
"id": 30763, | ||
"logprob": -1.140625, | ||
"special": false, | ||
"text": "Deep" | ||
}, | ||
{ | ||
"id": 4715, | ||
"logprob": -0.5551758, | ||
"special": false, | ||
"text": " learning" | ||
}, | ||
{ | ||
"id": 310, | ||
"logprob": -0.59033203, | ||
"special": false, | ||
"text": " is" | ||
}, | ||
{ | ||
"id": 247, | ||
"logprob": -0.70654297, | ||
"special": false, | ||
"text": " a" | ||
}, | ||
{ | ||
"id": 747, | ||
"logprob": -2.0410156, | ||
"special": false, | ||
"text": " new" | ||
}, | ||
{ | ||
"id": 1511, | ||
"logprob": -2.3789062, | ||
"special": false, | ||
"text": " type" | ||
}, | ||
{ | ||
"id": 273, | ||
"logprob": -0.0026435852, | ||
"special": false, | ||
"text": " of" | ||
}, | ||
{ | ||
"id": 5145, | ||
"logprob": -1.2841797, | ||
"special": false, | ||
"text": " machine" | ||
} | ||
], | ||
"top_tokens": null | ||
}, | ||
"generated_text": "\n\nDeep learning is a new type of machine" | ||
} |
99 changes: 99 additions & 0 deletions
99
integration-tests/models/__snapshots__/test_mamba/test_mamba_all_params.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,99 @@ | ||
{ | ||
"details": { | ||
"best_of_sequences": null, | ||
"finish_reason": "length", | ||
"generated_tokens": 10, | ||
"prefill": [ | ||
{ | ||
"id": 2502, | ||
"logprob": null, | ||
"text": " red" | ||
}, | ||
{ | ||
"id": 13, | ||
"logprob": -2.5234375, | ||
"text": "," | ||
}, | ||
{ | ||
"id": 8862, | ||
"logprob": -3.4433594, | ||
"text": " yellow" | ||
}, | ||
{ | ||
"id": 13, | ||
"logprob": -0.43017578, | ||
"text": "," | ||
}, | ||
{ | ||
"id": 209, | ||
"logprob": -8.21875, | ||
"text": " " | ||
} | ||
], | ||
"seed": 0, | ||
"tokens": [ | ||
{ | ||
"id": 187, | ||
"logprob": 0.0, | ||
"special": false, | ||
"text": "\n" | ||
}, | ||
{ | ||
"id": 395, | ||
"logprob": -0.46411133, | ||
"special": false, | ||
"text": "and" | ||
}, | ||
{ | ||
"id": 13735, | ||
"logprob": -2.1132812, | ||
"special": false, | ||
"text": " orange" | ||
}, | ||
{ | ||
"id": 313, | ||
"logprob": -1.2128906, | ||
"special": false, | ||
"text": " (" | ||
}, | ||
{ | ||
"id": 249, | ||
"logprob": -2.3671875, | ||
"special": false, | ||
"text": "in" | ||
}, | ||
{ | ||
"id": 253, | ||
"logprob": 0.0, | ||
"special": false, | ||
"text": " the" | ||
}, | ||
{ | ||
"id": 1340, | ||
"logprob": -1.640625, | ||
"special": false, | ||
"text": " order" | ||
}, | ||
{ | ||
"id": 597, | ||
"logprob": -0.5488281, | ||
"special": false, | ||
"text": " they" | ||
}, | ||
{ | ||
"id": 3176, | ||
"logprob": -0.48608398, | ||
"special": false, | ||
"text": " appear" | ||
}, | ||
{ | ||
"id": 275, | ||
"logprob": 0.0, | ||
"special": false, | ||
"text": " in" | ||
} | ||
], | ||
"top_tokens": null | ||
}, | ||
"generated_text": "blue, red, yellow, \nand orange (in the order they appear in" | ||
} |
Oops, something went wrong.