GLiNER.cpp is a C++-based inference engine for running GLiNER (Generalist and Lightweight Named Entity Recognition) models. GLiNER can identify any entity type using a bidirectional transformer encoder, offering a practical alternative to traditional NER models and large language models.
📄 Paper • 📢 Discord • 🤗 Demo • 🤗 Available models • 🧬 Official Repo
- Flexible entity recognition without predefined categories
- Lightweight and fast inference;
- Ideal for running GLiNER models on CPU;
First of all, clone the repository:
git clone https://github.com/Knowledgator/GLiNER.cpp.git
Then you need initialize and update submodules:
cd GLiNER.cpp
git submodule update --init --recursive
📦 CPU build dependencies & instructions
- CMake (>= 3.18)
- Rust and Cargo
- ONNXRuntime CPU version for your system
- OpenMP
You need to download ONNX runtime for your system.
Once you downloaded it, unpack it within the same directory as GLiNER.cpp code.
For tar.gz
files you can use the following command:
tar -xvzf onnxruntime-linux-x64-1.19.2.tgz
Then create a build directory and compile the project:
cmake -D ONNXRUNTIME_ROOTDIR="/home/usr/onnxruntime-linux-x64-1.19.2" -S . -B build
cmake --build build --target all -j
You need to provide the ONNXRUNTIME_ROOTDIR option, which should be set to the absolute path of the chosen ONNX runtime.
To run main.cpp you need an ONNX format model and tokenizer.json. You can:
- Search for pre-converted models on HuggingFace
- Convert a model yourself using the official Python script
Use the convert_to_onnx.py
script with the following arguments:
model_path
: Location of the GLiNER modelsave_path
: Where to save the ONNX filequantize
: Set to True for IntU8 quantization (optional)
Example:
python convert_to_onnx.py --model_path /path/to/your/model --save_path /path/to/save/onnx --quantize True
#include <iostream>
#include <vector>
#include <string>
#include "GLiNER/gliner_config.hpp"
#include "GLiNER/processor.hpp"
#include "GLiNER/decoder.hpp"
#include "GLiNER/model.hpp"
#include "GLiNER/tokenizer_utils.hpp"
int main() {
gliner::Config config{12, 512}; // Set your max_width and max_length
gliner::Model model("./gliner_small-v2.1/onnx/model.onnx", "./gliner_small-v2.1/tokenizer.json", config);
// Provide the path to the model, the path to the tokenizer, and the configuration.
// A sample input
std::vector<std::string> texts = {"Kyiv is the capital of Ukraine."};
std::vector<std::string> entities = {"city", "country", "river", "person", "car"};
auto output = model.inference(texts, entities);
std::cout << "\nTest Model Inference:" << std::endl;
for (size_t batch = 0; batch < output.size(); ++batch) {
std::cout << "Batch " << batch << ":\n";
for (const auto& span : output[batch]) {
std::cout << " Span: [" << span.startIdx << ", " << span.endIdx << "], "
<< "Class: " << span.classLabel << ", "
<< "Text: " << span.text << ", "
<< "Prob: " << span.prob << std::endl;
}
}
return 0;
}
📦Build dependencies & instruction
- CMake (>= 3.25)
- Rust and Cargo
- ONNXRuntime GPU version for your system
- OpenMP
- NVIDIA GPU
- CUDA Toolkit
- cuDNN
By default, the CPU is used. To use the GPU, you need to utilize the ONNX runtime with GPU support and set up cuDNN. Follow the instructions to install cuDNN here:
https://developer.nvidia.com/cudnn-downloads
Then create a build directory and compile the project:
cmake -D ONNXRUNTIME_ROOTDIR="/home/usr/onnxruntime-linux-x64-gpu-1.19.2" -D GPU_CHECK=ON -S . -B build
cmake --build build --target inference -j
GPU=ON: Enables check for CUDA dependencies. If not provided, the check will be omitted.
To use GPU:
- Specify it using 'device_id':
int device_id = 0 // (CUDA:0)
gliner::Model model("./gliner_small-v2.1/onnx/model.onnx", "./gliner_small-v2.1/tokenizer.json", config, device_id);
OR
- Use custom environment(Ort::Env) and session options(Ort::SessionOptions) of the ONNX runtime:
Ort::Env env = ...;
Ort::SessionOptions session_options = ...;
gliner::Model model("./gliner_small-v2.1/onnx/model.onnx", "./gliner_small-v2.1/tokenizer.json", config, env, session_options);
By default, the model uses a span-level configuration. To use token-level models, you need to specify the model type in the model configuration:
gliner::Config config{12, 512, gliner::TOKEN_LEVEL}; // Set your maxWidth, maxLength and modelType
gliner::Model model("./gliner-multitask-large-v0.5/onnx/model.onnx", "./gliner-multitask-large-v0.5/tokenizer.json", config);
GLiNER.cpp offers versatile entity recognition capabilities across various domains:
- Enhanced Search Query Understanding
- Real-time PII Detection
- Intelligent Document Parsing
- Content Summarization and Insight Extraction
- Automated Content Tagging and Categorization ...
- Add support of token-level GLiNER models;
- Further optimize inference speed;
- Implement bi-encoder GLiNER architecture for better scalability;
- Enable model training capabilities;
- Provide more usage examples.
For questions and support, please join our Discord community or open an issue on GitHub.