Support XGrammar backend as an alternative to Outlines #2900

2016bgeyer · 2025-01-10T18:03:07Z

Feature request

Support the use of XGrammar instead of Outlines for the backend Structured-Output generation.

Motivation

XGrammar has been shown to be much faster than Outlines for generation structured output

BlogPost:
https://blog.mlc.ai/2024/11/22/achieving-efficient-flexible-portable-structured-generation-with-xgrammar

"As shown in Figure 1, XGrammar outperforms existing structured generation solutions by up to 3.5x on the JSON schema workload and more than 10x on the CFG workload. Notably, the gap in CFG-guided generation is larger. This is because many JSON schema specifications can be expressed as regular expressions, bringing more optimizations that are not directly applicable to CFGs."

"Figure 2 shows end-to-end inference performance on LLM serving tasks. We can find the trend again that the gap on CFG-guided settings is larger, and the gap grows on larger batch sizes. This is because the GPU throughput is higher on larger batch sizes, putting greater pressure on the grammar engine running on CPUs. Note that the main slowdown of vLLM comes from its structured generation engine, which can be potentially eliminated by integrating with XGrammar. In all cases, XGrammar enables high-performance generation in both settings without compromising flexibility and efficiency."

Paper / Technical Report from XGrammar:
https://arxiv.org/abs/2411.15100