ollama/x/grammar/README.md

# grammar

Grammar-constrained decoding for LLM outputs using MLX.

## Performance

Performance depends on hardware, vocabulary size, grammar, and whether you
evaluate the MLX graph. See [Benchmarks](#benchmarks) for how to measure on your
setup.

### Design choices that keep masking fast

| Technique | Impact |
|-----------|--------|
| Precomputed token analysis | Terminal matches computed once at startup |
| Mask caching by grammar state signature | Reuse masks for repeated parser states |
| Partitioned tokens | Exact matches separated from DP candidates |

### Comparison Notes

- **llama.cpp**: Decodes each token to UTF-8, checks against PDA. No caching.
- **Outlines**: FSM-based. Compilation can take 40s-10min for complex schemas. Fast after compile.
- **XGrammar**: PDA with 99% context-independent tokens precomputed. State-of-the-art before this.
- **x/grammar**: Precomputed token analysis + mask caching by grammar state signature.

## Usage

```go
import (
    "github.com/ollama/ollama/x/grammar"
    "github.com/ollama/ollama/x/grammar/schema"
)

// Use built-in JSON grammar
g, _ := grammar.JSONGrammar()

// Or from JSON Schema (OpenAI-compatible)
g, _ := schema.Grammar(`{
    "type": "object",
    "properties": {
        "name": {"type": "string"},
        "age": {"type": "integer"}
    },
    "required": ["name", "age"]
}`)

// Or parse custom EBNF
g, _ := grammar.ParseEBNF(myGrammar, "root")

// Create engine with model vocabulary
engine, _ := grammar.NewEngine(g, vocab)
defer engine.Close()

// Generation loop
for !engine.IsComplete() {
    logits := model.Forward(tokens)
    masked := engine.ApplyMask(logits)  // Invalid tokens → -inf
    nextToken := sample(masked)
    engine.Accept(nextToken)
}
// Output conforms to the grammar when you only sample from masked tokens and call Accept
```

## EBNF Syntax

```ebnf
rule = expression .           # Rule definition (ends with .)
"literal"                      # Literal string
"a" … "z"                      # Character range (inclusive)
( a | b )                      # Grouping with alternation
[ optional ]                   # Optional (0 or 1)
{ repeated }                   # Repetition (0 or more)
```

### Example: JSON Grammar

```ebnf
json = value .

value = object | array | string | number | "true" | "false" | "null" .

object = "{" ws "}" | "{" members "}" .
members = member { "," member } .
member = ws string ws ":" element .

array = "[" ws "]" | "[" elements "]" .
elements = element { "," element } .
element = ws value ws .

string = "\"" { character } "\"" .
character = unescaped | escaped .
unescaped = " " | "!" | "#" … "[" | "]" … "~" .
escaped = "\\" ( "\"" | "\\" | "/" | "b" | "f" | "n" | "r" | "t" ) .

number = [ "-" ] integer [ fraction ] [ exponent ] .
integer = "0" | onenine { digit } .
fraction = "." digit { digit } .
exponent = ( "e" | "E" ) [ "+" | "-" ] digit { digit } .
digit = "0" … "9" .
onenine = "1" … "9" .

ws = { " " | "\t" | "\n" | "\r" } .
```

### Example: Custom Schema

```ebnf
root = "{" ws name_field "," ws age_field ws "}" .

name_field = "\"name\"" ws ":" ws string .
age_field = "\"age\"" ws ":" ws number .

string = "\"" { char } "\"" .
char = " " | "!" | "#" … "~" .

number = [ "-" ] digit { digit } .
digit = "0" … "9" .

ws = { " " | "\n" } .
```

## JSON Schema Support

OpenAI-compatible JSON Schema support with automatic EBNF generation:

```go
schema := `{
    "type": "object",
    "properties": {
        "user": {"$ref": "#/$defs/User"}
    },
    "required": ["user"],
    "$defs": {
        "User": {
            "type": "object",
            "properties": {
                "name": {"type": "string"},
                "email": {"type": "string", "format": "email"},
                "role": {"enum": ["admin", "user", "guest"]}
            },
            "required": ["name", "email", "role"]
        }
    }
}`

grammar, _ := schema.Grammar(schema)
```

### Supported Features

| Feature | Example |
|---------|---------|
| Basic types | `string`, `integer`, `number`, `boolean`, `null` |
| Objects | `properties`, `required` |
| Arrays | `items`, `minItems`, `maxItems` |
| Enums | `enum: ["a", "b", "c"]` |
| Constants | `const: "value"` |
| Union types | `anyOf`, `oneOf`, `type: ["string", "null"]` |
| References | `$ref: "#/$defs/Name"`, `$defs` |
| Formats | `date`, `time`, `date-time`, `email`, `uuid`, `ipv4` |

## Benchmarks

```bash
# Run all tests
go test -tags mlx ./x/grammar/...

# Run benchmarks
go test -tags mlx ./x/grammar/ -bench=.

# Compare with llama.cpp (outputs JSON)
go run -tags mlx ./x/grammar/cmd/compare -vocab-size 128000 -iterations 500

# Compare with a more complex schema
go run -tags mlx ./x/grammar/cmd/compare \
  -gbnf x/grammar/cmd/compare/complex.gbnf \
  -schema x/grammar/cmd/compare/complex.schema.json \
  -vocab-size 128000 -iterations 500
```

## References

- [XGrammar Paper](https://arxiv.org/abs/2411.15100) - Flexible and Efficient Structured Generation
- [Outlines](https://github.com/dottxt-ai/outlines) - Structured Text Generation
- [JSONSchemaBench](https://arxiv.org/abs/2501.10868) - Benchmark for Structured Outputs