Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add parametrization for the detokenization/decoding #1246

Merged
merged 7 commits into from
Nov 27, 2024

Conversation

pavel-esir
Copy link
Contributor

@pavel-esir pavel-esir commented Nov 21, 2024

image

Tokenizers IRs should be converted after openvinotoolkit/openvino_tokenizers#325 is merged

Ticket CVS-154151

@pavel-esir pavel-esir added the enhancement New feature or request label Nov 21, 2024
@pavel-esir pavel-esir added this to the 2025.0 milestone Nov 21, 2024
@github-actions github-actions bot added category: LLM LLM pipeline (stateful, static) category: GHA CI based on Github actions category: tokenizers Tokenizer class or submodule update category: Python API Python API for GenAI category: GenAI C++ API Changes in GenAI C++ public headers no-match-files labels Nov 21, 2024
src/cpp/include/openvino/genai/tokenizer.hpp Outdated Show resolved Hide resolved
src/cpp/include/openvino/genai/tokenizer.hpp Outdated Show resolved Hide resolved
src/cpp/src/tokenizer.cpp Show resolved Hide resolved
src/cpp/src/tokenizer.cpp Show resolved Hide resolved
@@ -217,3 +217,25 @@ def test_add_special_tokens(add_special_tokens, prompt):
res_genai = genai_tokenzier.encode(prompt, add_special_tokens).input_ids.data
res_hf = hf_tokenizer(prompt, return_tensors="np", add_special_tokens=add_special_tokens)["input_ids"]
assert np.all(res_genai == res_hf)

@pytest.mark.precommit
@pytest.mark.xfail(reason="Need to turn them back on when openvino_tokenizers will be updated.")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tests will be green on CI only after genai will be update to the latest openvino_tokenizers. Meantime checked them locally, will enable back as soon as openvino_tokenziers will be updated.
image

src/cpp/include/openvino/genai/tokenizer.hpp Outdated Show resolved Hide resolved
src/cpp/include/openvino/genai/tokenizer.hpp Outdated Show resolved Hide resolved
src/python/py_tokenizer.cpp Outdated Show resolved Hide resolved
src/cpp/src/tokenizer.cpp Outdated Show resolved Hide resolved
src/cpp/src/tokenizer.cpp Outdated Show resolved Hide resolved
src/cpp/src/tokenizer.cpp Outdated Show resolved Hide resolved
src/cpp/src/tokenizer.cpp Outdated Show resolved Hide resolved
src/cpp/src/tokenizer.cpp Outdated Show resolved Hide resolved
src/cpp/src/tokenizer.cpp Show resolved Hide resolved
@ilya-lavrenov ilya-lavrenov added this pull request to the merge queue Nov 26, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 26, 2024
@pavel-esir pavel-esir added this pull request to the merge queue Nov 26, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 26, 2024
@ilya-lavrenov ilya-lavrenov added this pull request to the merge queue Nov 26, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 26, 2024
@pavel-esir pavel-esir enabled auto-merge November 26, 2024 19:18
@andrei-kochin andrei-kochin merged commit 5ee41ec into openvinotoolkit:master Nov 27, 2024
49 of 52 checks passed
@pavel-esir pavel-esir deleted the parametrize_decode branch November 27, 2024 10:09
@pavel-esir
Copy link
Contributor Author

Need to enable feature in 2024.6 release as well (port 2024.5 branch).

@ilya-lavrenov ilya-lavrenov added the port to LTS PR needs to be ported to LTS label Nov 27, 2024
@ilya-lavrenov ilya-lavrenov mentioned this pull request Dec 3, 2024
ilya-lavrenov added a commit that referenced this pull request Dec 4, 2024
      - #1158
- #1178
- #1214
- #1243
- #1253
- #1259
- #1266
- #1271
- #1278
- #1280
- #1284
- e4a86f6
- #1246
- #958

---------

Co-authored-by: Anastasiia Pnevskaia <[email protected]>
Co-authored-by: Helena Kloosterman <[email protected]>
Co-authored-by: Vladimir Zlobin <[email protected]>
Co-authored-by: Dmitry Matveev <[email protected]>
Co-authored-by: Anna Likholat <[email protected]>
Co-authored-by: Alina Kladieva <[email protected]>
@ilya-lavrenov ilya-lavrenov removed the port to LTS PR needs to be ported to LTS label Dec 4, 2024
@pavel-esir
Copy link
Contributor Author

Discussed with @andrei-kochin, since OVMS will not use this feature in 2024.6, no need to port it there. Ilya already removed label

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: GenAI C++ API Changes in GenAI C++ public headers category: GHA CI based on Github actions category: LLM LLM pipeline (stateful, static) category: Python API Python API for GenAI category: tokenizers Tokenizer class or submodule update enhancement New feature or request no-match-files
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants