grammar

Grammar-constrained decoding for LLM outputs using MLX.

Performance

Performance depends on hardware, vocabulary size, grammar, and whether you evaluate the MLX graph. See Benchmarks for how to measure on your setup.

Design choices that keep masking fast

Technique	Impact
Precomputed token analysis	Terminal matches computed once at startup
Mask caching by grammar state signature	Reuse masks for repeated parser states
Partitioned tokens	Exact matches separated from DP candidates

Comparison Notes

llama.cpp: Decodes each token to UTF-8, checks against PDA. No caching.
Outlines: FSM-based. Compilation can take 40s-10min for complex schemas. Fast after compile.
XGrammar: PDA with 99% context-independent tokens precomputed. State-of-the-art before this.
x/grammar: Precomputed token analysis + mask caching by grammar state signature.

Usage

import (
    "github.com/ollama/ollama/x/grammar"
    "github.com/ollama/ollama/x/grammar/schema"
)

// Use built-in JSON grammar
g, _ := grammar.JSONGrammar()

// Or from JSON Schema (OpenAI-compatible)
g, _ := schema.Grammar(`{
    "type": "object",
    "properties": {
        "name": {"type": "string"},
        "age": {"type": "integer"}
    },
    "required": ["name", "age"]
}`)

// Or parse custom EBNF
g, _ := grammar.ParseEBNF(myGrammar, "root")

// Create engine with model vocabulary
engine, _ := grammar.NewEngine(g, vocab)
defer engine.Close()

// Generation loop
for !engine.IsComplete() {
    logits := model.Forward(tokens)
    masked := engine.ApplyMask(logits)  // Invalid tokens → -inf
    nextToken := sample(masked)
    engine.Accept(nextToken)
}
// Output conforms to the grammar when you only sample from masked tokens and call Accept

EBNF Syntax

rule = expression .           # Rule definition (ends with .)
"literal"                      # Literal string
"a" … "z"                      # Character range (inclusive)
( a | b )                      # Grouping with alternation
[ optional ]                   # Optional (0 or 1)
{ repeated }                   # Repetition (0 or more)

Example: JSON Grammar

json = value .

value = object | array | string | number | "true" | "false" | "null" .

object = "{" ws "}" | "{" members "}" .
members = member { "," member } .
member = ws string ws ":" element .

array = "[" ws "]" | "[" elements "]" .
elements = element { "," element } .
element = ws value ws .

string = "\"" { character } "\"" .
character = unescaped | escaped .
unescaped = " " | "!" | "#" … "[" | "]" … "~" .
escaped = "\\" ( "\"" | "\\" | "/" | "b" | "f" | "n" | "r" | "t" ) .

number = [ "-" ] integer [ fraction ] [ exponent ] .
integer = "0" | onenine { digit } .
fraction = "." digit { digit } .
exponent = ( "e" | "E" ) [ "+" | "-" ] digit { digit } .
digit = "0" … "9" .
onenine = "1" … "9" .

ws = { " " | "\t" | "\n" | "\r" } .

Example: Custom Schema

root = "{" ws name_field "," ws age_field ws "}" .

name_field = "\"name\"" ws ":" ws string .
age_field = "\"age\"" ws ":" ws number .

string = "\"" { char } "\"" .
char = " " | "!" | "#" … "~" .

number = [ "-" ] digit { digit } .
digit = "0" … "9" .

ws = { " " | "\n" } .

JSON Schema Support

OpenAI-compatible JSON Schema support with automatic EBNF generation:

schema := `{
    "type": "object",
    "properties": {
        "user": {"$ref": "#/$defs/User"}
    },
    "required": ["user"],
    "$defs": {
        "User": {
            "type": "object",
            "properties": {
                "name": {"type": "string"},
                "email": {"type": "string", "format": "email"},
                "role": {"enum": ["admin", "user", "guest"]}
            },
            "required": ["name", "email", "role"]
        }
    }
}`

grammar, _ := schema.Grammar(schema)

Supported Features

Feature	Example
Basic types	`string`, `integer`, `number`, `boolean`, `null`
Objects	`properties`, `required`
Arrays	`items`, `minItems`, `maxItems`
Enums	`enum: ["a", "b", "c"]`
Constants	`const: "value"`
Union types	`anyOf`, `oneOf`, `type: ["string", "null"]`
References	`$ref: "#/$defs/Name"`, `$defs`
Formats	`date`, `time`, `date-time`, `email`, `uuid`, `ipv4`

Benchmarks

# Run all tests
go test -tags mlx ./x/grammar/...

# Run benchmarks
go test -tags mlx ./x/grammar/ -bench=.

# Compare with llama.cpp (outputs JSON)
go run -tags mlx ./x/grammar/cmd/compare -vocab-size 128000 -iterations 500

# Compare with a more complex schema
go run -tags mlx ./x/grammar/cmd/compare \
  -gbnf x/grammar/cmd/compare/complex.gbnf \
  -schema x/grammar/cmd/compare/complex.schema.json \
  -vocab-size 128000 -iterations 500

References

XGrammar Paper - Flexible and Efficient Structured Generation
Outlines - Structured Text Generation
JSONSchemaBench - Benchmark for Structured Outputs

5.0 KiB Raw Blame History