Client for the vLLM API with minimal dependencies.
pip install vllm-client
See example.py for the following:
- Single generation
- Streaming
- Batch inference
It should work out of the box with a vLLM API server.
sampling_params.py
needs to be kept in sync with vLLM. It is a simplified version of their class, containing only the code required on client side.