Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate pipeline (#334) #480

Merged

Conversation

Wovchena
Copy link
Collaborator

@Wovchena Wovchena commented Jun 7, 2024

LLM return logits with probabilities of each token, these probabilities can be converted to tokens/words with different technics: greedy decoding, beam search decoding, random sampling, etc.

This requires writing user unfriendly post-processing even for the simplest scenario of greedy decoding. In order to make live easier we we combined all decoding scenarios into a single function call, where the decoding method and parameters are specified by arguments.

In this PR we provide a user friendly API for text generation inspired by generate method from HuggingFace transformers library.

Example 1: Greedy search generation

LLMPipeline pipe(model_path, device);

// Will try to load config from generation_config.json.
// but if not found default velues for gready search will be used
GenerationConfig config = pipe.generation_config();

cout << pipe(prompt, config.max_new_tokens(20));

Example 2: TextStreaming mode

LLMPipeline pipe(model_path, device);

GenerationConfig config = pipe.generation_config();

auto text_streamer = TextStreamer{pipe};
auto text_streamer_callback = [&text_streamer](std::vector<int64_t>&& tokens, LLMPipeline& pipe){
    text_streamer.put(tokens[0]);
};

pipe(prompt, config.max_new_tokens(20).set_callback(text_streamer_callback));
text_streamer.end();

CVS-132907 CVS-137920


LLM return logits with probabilities of each token, these probabilities
can be converted to tokens/words with different technics: greedy
decoding, beam search decoding, random sampling, etc.

This requires writing user unfriendly post-processing even for the
simplest scenario of greedy decoding. In order to make live easier we we
combined all decoding scenarios into a single function call, where the
decoding method and parameters are specified by arguments.

In this PR we provide a user friendly API for text generation inspired
by `generate` method from HuggingFace transformers library.

- [x] enable calling tokenizers/detokenizers from LLMPipeline
- [ ] add callback for streaming mode - done partially, need to improve
- [x] rewritten samples with the current approach:
[causal_lm/cpp/generate_pipeline/generate_sample.cpp#L73-L83](https://github.com/pavel-esir/openvino.genai/blob/generate_pipeline/text_generation/causal_lm/cpp/generate_pipeline/generate_sample.cpp#L73-L83)
- [x] Multibatch greedy decoding
- [ ] Speculative decoding
- [ ] Grouped Beam Search decoding: ready for batch 1, need to rebase
multibatch support after merging
openvinotoolkit#349
- [x] Random sampling

Example 1: Greedy search generation
```
LLMPipeline pipe(model_path, device);

// Will try to load config from generation_config.json.
// but if not found default velues for gready search will be used
GenerationConfig config = pipe.generation_config();

cout << pipe(prompt, config.max_new_tokens(20));
```

Example 2: TextStreaming mode
```
LLMPipeline pipe(model_path, device);

GenerationConfig config = pipe.generation_config();

auto text_streamer = TextStreamer{pipe};
auto text_streamer_callback = [&text_streamer](std::vector<int64_t>&& tokens, LLMPipeline& pipe){
    text_streamer.put(tokens[0]);
};

pipe(prompt, config.max_new_tokens(20).set_callback(text_streamer_callback));
text_streamer.end();
```

CVS-132907 CVS-137920

---------

Co-authored-by: Wovchena <[email protected]>
Co-authored-by: Ilya Lavrenov <[email protected]>
Co-authored-by: Alexander Suvorov <[email protected]>
Co-authored-by: Yaroslav Tarkan <[email protected]>
Co-authored-by: Xiake Sun <[email protected]>
Co-authored-by: wenyi5608 <[email protected]>
Co-authored-by: Ekaterina Aidova <[email protected]>
Co-authored-by: guozhong wang <[email protected]>
Co-authored-by: Chen Peter <[email protected]>
@github-actions github-actions bot added the category: llm_bench Label for tool/llm_bench folder label Jun 7, 2024
@ilya-lavrenov ilya-lavrenov merged commit 26c3c40 into openvinotoolkit:releases/2024/2 Jun 10, 2024
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: llm_bench Label for tool/llm_bench folder
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants