SGLang
TL;DR
Fast serving with RadixAttention for prefix caching. Optimized for structured generation.
Use when
- +Structured outputs needed
- +Heavy prefix reuse
- +RAG applications
Skip when
- -Simple text generation
- -Non-NVIDIA hardware
SGLang is a structured generation language and serving framework. It features RadixAttention for efficient prefix caching and a programming model for complex LLM programs.
Features
- **RadixAttention**: Radix tree-based prefix caching - **Structured Generation**: Constrained decoding - **High Performance**: Comparable to vLLM - **Programming Model**: DSL for LLM workflows
Best For
- RAG pipelines - Structured outputs (JSON) - Multi-turn conversations
Code Examples
SGLang structured generationpython
import sglang as sgl
@sgl.function
def json_gen(s):
s += "Generate JSON: " + sgl.gen("json", regex=r'\{.*\}')