Files
ollama/x/grammar/README.md

5.0 KiB

grammar

Grammar-constrained decoding for LLM outputs using MLX.

Performance

Performance depends on hardware, vocabulary size, grammar, and whether you evaluate the MLX graph. See Benchmarks for how to measure on your setup.

Design choices that keep masking fast

Technique Impact
Precomputed token analysis Terminal matches computed once at startup
Mask caching by grammar state signature Reuse masks for repeated parser states
Partitioned tokens Exact matches separated from DP candidates

Comparison Notes

  • llama.cpp: Decodes each token to UTF-8, checks against PDA. No caching.
  • Outlines: FSM-based. Compilation can take 40s-10min for complex schemas. Fast after compile.
  • XGrammar: PDA with 99% context-independent tokens precomputed. State-of-the-art before this.
  • x/grammar: Precomputed token analysis + mask caching by grammar state signature.

Usage

import (
    "github.com/ollama/ollama/x/grammar"
    "github.com/ollama/ollama/x/grammar/schema"
)

// Use built-in JSON grammar
g, _ := grammar.JSONGrammar()

// Or from JSON Schema (OpenAI-compatible)
g, _ := schema.Grammar(`{
    "type": "object",
    "properties": {
        "name": {"type": "string"},
        "age": {"type": "integer"}
    },
    "required": ["name", "age"]
}`)

// Or parse custom EBNF
g, _ := grammar.ParseEBNF(myGrammar, "root")

// Create engine with model vocabulary
engine, _ := grammar.NewEngine(g, vocab)
defer engine.Close()

// Generation loop
for !engine.IsComplete() {
    logits := model.Forward(tokens)
    masked := engine.ApplyMask(logits)  // Invalid tokens → -inf
    nextToken := sample(masked)
    engine.Accept(nextToken)
}
// Output conforms to the grammar when you only sample from masked tokens and call Accept

EBNF Syntax

rule = expression .           # Rule definition (ends with .)
"literal"                      # Literal string
"a"  "z"                      # Character range (inclusive)
( a | b )                      # Grouping with alternation
[ optional ]                   # Optional (0 or 1)
{ repeated }                   # Repetition (0 or more)

Example: JSON Grammar

json = value .

value = object | array | string | number | "true" | "false" | "null" .

object = "{" ws "}" | "{" members "}" .
members = member { "," member } .
member = ws string ws ":" element .

array = "[" ws "]" | "[" elements "]" .
elements = element { "," element } .
element = ws value ws .

string = "\"" { character } "\"" .
character = unescaped | escaped .
unescaped = " " | "!" | "#"  "[" | "]"  "~" .
escaped = "\\" ( "\"" | "\\" | "/" | "b" | "f" | "n" | "r" | "t" ) .

number = [ "-" ] integer [ fraction ] [ exponent ] .
integer = "0" | onenine { digit } .
fraction = "." digit { digit } .
exponent = ( "e" | "E" ) [ "+" | "-" ] digit { digit } .
digit = "0"  "9" .
onenine = "1"  "9" .

ws = { " " | "\t" | "\n" | "\r" } .

Example: Custom Schema

root = "{" ws name_field "," ws age_field ws "}" .

name_field = "\"name\"" ws ":" ws string .
age_field = "\"age\"" ws ":" ws number .

string = "\"" { char } "\"" .
char = " " | "!" | "#"  "~" .

number = [ "-" ] digit { digit } .
digit = "0"  "9" .

ws = { " " | "\n" } .

JSON Schema Support

OpenAI-compatible JSON Schema support with automatic EBNF generation:

schema := `{
    "type": "object",
    "properties": {
        "user": {"$ref": "#/$defs/User"}
    },
    "required": ["user"],
    "$defs": {
        "User": {
            "type": "object",
            "properties": {
                "name": {"type": "string"},
                "email": {"type": "string", "format": "email"},
                "role": {"enum": ["admin", "user", "guest"]}
            },
            "required": ["name", "email", "role"]
        }
    }
}`

grammar, _ := schema.Grammar(schema)

Supported Features

Feature Example
Basic types string, integer, number, boolean, null
Objects properties, required
Arrays items, minItems, maxItems
Enums enum: ["a", "b", "c"]
Constants const: "value"
Union types anyOf, oneOf, type: ["string", "null"]
References $ref: "#/$defs/Name", $defs
Formats date, time, date-time, email, uuid, ipv4

Benchmarks

# Run all tests
go test -tags mlx ./x/grammar/...

# Run benchmarks
go test -tags mlx ./x/grammar/ -bench=.

# Compare with llama.cpp (outputs JSON)
go run -tags mlx ./x/grammar/cmd/compare -vocab-size 128000 -iterations 500

# Compare with a more complex schema
go run -tags mlx ./x/grammar/cmd/compare \
  -gbnf x/grammar/cmd/compare/complex.gbnf \
  -schema x/grammar/cmd/compare/complex.schema.json \
  -vocab-size 128000 -iterations 500

References