Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update streaming in LM Encoding & CB #1377

Merged
merged 7 commits into from
Dec 16, 2024

Conversation

iefode
Copy link
Contributor

@iefode iefode commented Dec 13, 2024

No description provided.

@github-actions github-actions bot added the category: LLM LLM pipeline (stateful, static) label Dec 13, 2024
@ilya-lavrenov ilya-lavrenov added this to the 2025.0 milestone Dec 13, 2024
@github-actions github-actions bot added the category: continuous batching Continuous batching label Dec 13, 2024
src/cpp/src/lm_encoding.cpp Show resolved Hide resolved
@@ -126,7 +126,7 @@ std::pair<EncodedResults, int32_t> get_lm_encoded_results(
get_active_sequence_groups),
active_sequence_groups.end());

while (active_sequence_groups.size() > 0) {
do {
size_t total_num_tokens = 0;

for (auto& sequence_group : active_sequence_groups) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW

@sbalandi should we use active_sequence_groups instead of sequence_groups when compute beam_offets ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like yes, we can use active_sequence_groups here, but that variant is ok too, as in non active num_running_seqs will be 0
I'll check it to be sure and fix in #1215

@github-actions github-actions bot added the category: speculative decoding Speculative decoding label Dec 15, 2024
@iefode iefode changed the title Update streaming in LM Encoding Update streaming in LM Encoding & CB Dec 16, 2024
@iefode iefode enabled auto-merge December 16, 2024 10:25
@ilya-lavrenov ilya-lavrenov added the bug Something isn't working label Dec 16, 2024
@iefode iefode added this pull request to the merge queue Dec 16, 2024
Merged via the queue into openvinotoolkit:master with commit 8ce5eb3 Dec 16, 2024
59 checks passed
@iefode iefode deleted the streaming_lm_encoding branch December 16, 2024 14:02
if (streamer_ptr && generations.at(0).get()->can_read()) {
std::unordered_map<uint64_t, GenerationOutput> token = generations.at(0).get()->back();
for (const auto& gen_token : token.begin()->second.generated_ids) {
if (!streamer_ptr->put(gen_token)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy-pasted incorrect code?
According to documentation

/// @brief put is called every time new token is decoded,
/// @return bool flag to indicate whether generation should be stopped, if return true generation stops
virtual bool put(int64_t token) = 0;

we need to break when put returns true

OPENVINO_ASSERT(1 == token.size());
OPENVINO_ASSERT(1 == token.begin()->second.generated_ids.size());
continue_generation = !streamer_ptr->put(token.begin()->second.generated_ids.at(0));
for (const auto& gen_token : token.begin()->second.generated_ids) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

continue_generation assignment is dropped here and hence, we can have abandoned requests with allocated block tables as drop_requests() is not called below.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working category: continuous batching Continuous batching category: LLM LLM pipeline (stateful, static) category: speculative decoding Speculative decoding
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants