Update streaming in LM Encoding & CB #1377

iefode · 2024-12-13T07:40:34Z

No description provided.

src/cpp/src/lm_encoding.cpp

ilya-lavrenov · 2024-12-13T08:06:11Z

src/cpp/src/lm_encoding.cpp

@@ -126,7 +126,7 @@ std::pair<EncodedResults, int32_t> get_lm_encoded_results(
                                                get_active_sequence_groups),
                                 active_sequence_groups.end());

-    while (active_sequence_groups.size() > 0) {
+    do {
        size_t total_num_tokens = 0;

        for (auto& sequence_group : active_sequence_groups) {


BTW

@sbalandi should we use active_sequence_groups instead of sequence_groups when compute beam_offets ?

it looks like yes, we can use active_sequence_groups here, but that variant is ok too, as in non active num_running_seqs will be 0
I'll check it to be sure and fix in #1215

src/cpp/src/lm_encoding.cpp

…nai into streaming_lm_encoding

ilya-lavrenov · 2024-12-18T06:36:08Z

src/cpp/src/lm_encoding.cpp

+        if (streamer_ptr && generations.at(0).get()->can_read()) {
+            std::unordered_map<uint64_t, GenerationOutput> token = generations.at(0).get()->back();
+            for (const auto& gen_token : token.begin()->second.generated_ids) {
+                if (!streamer_ptr->put(gen_token)) {


copy-pasted incorrect code?
According to documentation

openvino.genai/src/cpp/include/openvino/genai/streamer_base.hpp

Lines 18 to 20 in 9bcadf7

/// @brief put is called every time new token is decoded,

/// @return bool flag to indicate whether generation should be stopped, if return true generation stops

virtual bool put(int64_t token) = 0;

we need to break when put returns true

ilya-lavrenov · 2024-12-18T06:37:58Z

src/cpp/src/continuous_batching_impl.cpp

-            OPENVINO_ASSERT(1 == token.size());
-            OPENVINO_ASSERT(1 == token.begin()->second.generated_ids.size());
-            continue_generation = !streamer_ptr->put(token.begin()->second.generated_ids.at(0));
+            for (const auto& gen_token : token.begin()->second.generated_ids) {


continue_generation assignment is dropped here and hence, we can have abandoned requests with allocated block tables as drop_requests() is not called below.

iefode requested review from ilya-lavrenov and sbalandi December 13, 2024 07:40

github-actions bot added the category: LLM LLM pipeline (stateful, static) label Dec 13, 2024

iefode added 2 commits December 13, 2024 11:42

Update streaming in LM Encoding

3daf0de

Improve code style

310461e

ilya-lavrenov reviewed Dec 13, 2024

View reviewed changes

src/cpp/src/lm_encoding.cpp Outdated Show resolved Hide resolved

ilya-lavrenov assigned ilya-lavrenov and sbalandi Dec 13, 2024

ilya-lavrenov added this to the 2025.0 milestone Dec 13, 2024

github-actions bot added the category: continuous batching Continuous batching label Dec 13, 2024

CB

be74340

ilya-lavrenov reviewed Dec 13, 2024

View reviewed changes

sbalandi reviewed Dec 13, 2024

View reviewed changes

src/cpp/src/lm_encoding.cpp Show resolved Hide resolved

github-actions bot added the category: speculative decoding Speculative decoding label Dec 15, 2024

iefode added 2 commits December 15, 2024 15:53

Update continuous_batching_impl.cpp

61e55a6

Revert

ad44b32

ilya-lavrenov approved these changes Dec 16, 2024

View reviewed changes

iefode added 2 commits December 16, 2024 13:24

end

02622c9

Merge branch 'streaming_lm_encoding' of github.com:iefode/openvino.ge…

dd5ca50

…nai into streaming_lm_encoding

iefode changed the title ~~Update streaming in LM Encoding~~ Update streaming in LM Encoding & CB Dec 16, 2024

iefode enabled auto-merge December 16, 2024 10:25

ilya-lavrenov added the bug Something isn't working label Dec 16, 2024

iefode added this pull request to the merge queue Dec 16, 2024

Merged via the queue into openvinotoolkit:master with commit 8ce5eb3 Dec 16, 2024
59 checks passed

iefode deleted the streaming_lm_encoding branch December 16, 2024 14:02

ilya-lavrenov reviewed Dec 18, 2024

View reviewed changes

ilya-lavrenov mentioned this pull request Dec 18, 2024

[ CB ][ SD ] Support streaming with using stop_strings and include_stop_strings #1382

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update streaming in LM Encoding & CB #1377

Update streaming in LM Encoding & CB #1377

iefode commented Dec 13, 2024

ilya-lavrenov Dec 13, 2024

sbalandi Dec 13, 2024

ilya-lavrenov Dec 18, 2024

ilya-lavrenov Dec 18, 2024

	/// @brief put is called every time new token is decoded,
	/// @return bool flag to indicate whether generation should be stopped, if return true generation stops
	virtual bool put(int64_t token) = 0;

Update streaming in LM Encoding & CB #1377

Update streaming in LM Encoding & CB #1377

Conversation

iefode commented Dec 13, 2024

ilya-lavrenov Dec 13, 2024

Choose a reason for hiding this comment

sbalandi Dec 13, 2024

Choose a reason for hiding this comment

ilya-lavrenov Dec 18, 2024

Choose a reason for hiding this comment

ilya-lavrenov Dec 18, 2024

Choose a reason for hiding this comment