From 6336aa3624f45438ca794be9c9b3bdf313eb5e50 Mon Sep 17 00:00:00 2001 From: Artur Paniukov Date: Sat, 31 Aug 2024 00:20:55 +0400 Subject: [PATCH] Add Building Without FastTokenizer Info (#247) --- README.md | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/README.md b/README.md index d6ccb5ec3..0373541a6 100644 --- a/README.md +++ b/README.md @@ -190,6 +190,22 @@ By default, all available ICU locales are supported, which significantly increas By following these instructions, you can effectively reduce the size of the ICU libraries in your final package. +### Build OpenVINO Tokenizers without FastTokenizer Library + +If a tokenizer doesn't use `CaseFold`, `UnicodeNormalization` or `Wordpiece` operations, you can drastically reduce package binary size by building OpenVINO Tokenizers without FastTokenizer dependency with this flag: + +```bash +-DENABLE_FAST_TOKENIZERS=OFF +``` + +This option can also help with building for platform that is supported by FastTokenizer, for example `Android x86_64`. + +Example for a pip installation path: +```bash + +pip install git+https://github.com/openvinotoolkit/openvino_tokenizers.git --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/nightly --config-settings=override=cmake.options.ENABLE_FAST_TOKENIZERS=OFF +``` + ## Usage :warning: OpenVINO Tokenizers can be inferred on a `CPU` device only.