mirror of
https://github.com/ollama/ollama.git
synced 2026-04-18 04:54:08 +02:00
186 lines
5.0 KiB
Markdown
186 lines
5.0 KiB
Markdown
# grammar
|
|
|
|
Grammar-constrained decoding for LLM outputs using MLX.
|
|
|
|
## Performance
|
|
|
|
Performance depends on hardware, vocabulary size, grammar, and whether you
|
|
evaluate the MLX graph. See [Benchmarks](#benchmarks) for how to measure on your
|
|
setup.
|
|
|
|
### Design choices that keep masking fast
|
|
|
|
| Technique | Impact |
|
|
|-----------|--------|
|
|
| Precomputed token analysis | Terminal matches computed once at startup |
|
|
| Mask caching by grammar state signature | Reuse masks for repeated parser states |
|
|
| Partitioned tokens | Exact matches separated from DP candidates |
|
|
|
|
### Comparison Notes
|
|
|
|
- **llama.cpp**: Decodes each token to UTF-8, checks against PDA. No caching.
|
|
- **Outlines**: FSM-based. Compilation can take 40s-10min for complex schemas. Fast after compile.
|
|
- **XGrammar**: PDA with 99% context-independent tokens precomputed. State-of-the-art before this.
|
|
- **x/grammar**: Precomputed token analysis + mask caching by grammar state signature.
|
|
|
|
## Usage
|
|
|
|
```go
|
|
import (
|
|
"github.com/ollama/ollama/x/grammar"
|
|
"github.com/ollama/ollama/x/grammar/schema"
|
|
)
|
|
|
|
// Use built-in JSON grammar
|
|
g, _ := grammar.JSONGrammar()
|
|
|
|
// Or from JSON Schema (OpenAI-compatible)
|
|
g, _ := schema.Grammar(`{
|
|
"type": "object",
|
|
"properties": {
|
|
"name": {"type": "string"},
|
|
"age": {"type": "integer"}
|
|
},
|
|
"required": ["name", "age"]
|
|
}`)
|
|
|
|
// Or parse custom EBNF
|
|
g, _ := grammar.ParseEBNF(myGrammar, "root")
|
|
|
|
// Create engine with model vocabulary
|
|
engine, _ := grammar.NewEngine(g, vocab)
|
|
defer engine.Close()
|
|
|
|
// Generation loop
|
|
for !engine.IsComplete() {
|
|
logits := model.Forward(tokens)
|
|
masked := engine.ApplyMask(logits) // Invalid tokens → -inf
|
|
nextToken := sample(masked)
|
|
engine.Accept(nextToken)
|
|
}
|
|
// Output conforms to the grammar when you only sample from masked tokens and call Accept
|
|
```
|
|
|
|
## EBNF Syntax
|
|
|
|
```ebnf
|
|
rule = expression . # Rule definition (ends with .)
|
|
"literal" # Literal string
|
|
"a" … "z" # Character range (inclusive)
|
|
( a | b ) # Grouping with alternation
|
|
[ optional ] # Optional (0 or 1)
|
|
{ repeated } # Repetition (0 or more)
|
|
```
|
|
|
|
### Example: JSON Grammar
|
|
|
|
```ebnf
|
|
json = value .
|
|
|
|
value = object | array | string | number | "true" | "false" | "null" .
|
|
|
|
object = "{" ws "}" | "{" members "}" .
|
|
members = member { "," member } .
|
|
member = ws string ws ":" element .
|
|
|
|
array = "[" ws "]" | "[" elements "]" .
|
|
elements = element { "," element } .
|
|
element = ws value ws .
|
|
|
|
string = "\"" { character } "\"" .
|
|
character = unescaped | escaped .
|
|
unescaped = " " | "!" | "#" … "[" | "]" … "~" .
|
|
escaped = "\\" ( "\"" | "\\" | "/" | "b" | "f" | "n" | "r" | "t" ) .
|
|
|
|
number = [ "-" ] integer [ fraction ] [ exponent ] .
|
|
integer = "0" | onenine { digit } .
|
|
fraction = "." digit { digit } .
|
|
exponent = ( "e" | "E" ) [ "+" | "-" ] digit { digit } .
|
|
digit = "0" … "9" .
|
|
onenine = "1" … "9" .
|
|
|
|
ws = { " " | "\t" | "\n" | "\r" } .
|
|
```
|
|
|
|
### Example: Custom Schema
|
|
|
|
```ebnf
|
|
root = "{" ws name_field "," ws age_field ws "}" .
|
|
|
|
name_field = "\"name\"" ws ":" ws string .
|
|
age_field = "\"age\"" ws ":" ws number .
|
|
|
|
string = "\"" { char } "\"" .
|
|
char = " " | "!" | "#" … "~" .
|
|
|
|
number = [ "-" ] digit { digit } .
|
|
digit = "0" … "9" .
|
|
|
|
ws = { " " | "\n" } .
|
|
```
|
|
|
|
## JSON Schema Support
|
|
|
|
OpenAI-compatible JSON Schema support with automatic EBNF generation:
|
|
|
|
```go
|
|
schema := `{
|
|
"type": "object",
|
|
"properties": {
|
|
"user": {"$ref": "#/$defs/User"}
|
|
},
|
|
"required": ["user"],
|
|
"$defs": {
|
|
"User": {
|
|
"type": "object",
|
|
"properties": {
|
|
"name": {"type": "string"},
|
|
"email": {"type": "string", "format": "email"},
|
|
"role": {"enum": ["admin", "user", "guest"]}
|
|
},
|
|
"required": ["name", "email", "role"]
|
|
}
|
|
}
|
|
}`
|
|
|
|
grammar, _ := schema.Grammar(schema)
|
|
```
|
|
|
|
### Supported Features
|
|
|
|
| Feature | Example |
|
|
|---------|---------|
|
|
| Basic types | `string`, `integer`, `number`, `boolean`, `null` |
|
|
| Objects | `properties`, `required` |
|
|
| Arrays | `items`, `minItems`, `maxItems` |
|
|
| Enums | `enum: ["a", "b", "c"]` |
|
|
| Constants | `const: "value"` |
|
|
| Union types | `anyOf`, `oneOf`, `type: ["string", "null"]` |
|
|
| References | `$ref: "#/$defs/Name"`, `$defs` |
|
|
| Formats | `date`, `time`, `date-time`, `email`, `uuid`, `ipv4` |
|
|
|
|
## Benchmarks
|
|
|
|
```bash
|
|
# Run all tests
|
|
go test -tags mlx ./x/grammar/...
|
|
|
|
# Run benchmarks
|
|
go test -tags mlx ./x/grammar/ -bench=.
|
|
|
|
# Compare with llama.cpp (outputs JSON)
|
|
go run -tags mlx ./x/grammar/cmd/compare -vocab-size 128000 -iterations 500
|
|
|
|
# Compare with a more complex schema
|
|
go run -tags mlx ./x/grammar/cmd/compare \
|
|
-gbnf x/grammar/cmd/compare/complex.gbnf \
|
|
-schema x/grammar/cmd/compare/complex.schema.json \
|
|
-vocab-size 128000 -iterations 500
|
|
```
|
|
|
|
## References
|
|
|
|
- [XGrammar Paper](https://arxiv.org/abs/2411.15100) - Flexible and Efficient Structured Generation
|
|
- [Outlines](https://github.com/dottxt-ai/outlines) - Structured Text Generation
|
|
- [JSONSchemaBench](https://arxiv.org/abs/2501.10868) - Benchmark for Structured Outputs
|