x/grammar: add experimental GPU accelerated constrained decoding package

2026-04-19 14:54:21 +02:00 · 2026-01-10 16:42:45 -08:00
parent 7cc2a653f2
commit e23ddd84b8
38 changed files with 5819 additions and 36 deletions
--- a/x/grammar/README.md
+++ b/x/grammar/README.md
@@ -0,0 +1,185 @@
+# grammar
+
+Grammar-constrained decoding for LLM outputs using MLX.
+
+## Performance
+
+Performance depends on hardware, vocabulary size, grammar, and whether you
+evaluate the MLX graph. See [Benchmarks](#benchmarks) for how to measure on your
+setup.
+
+### Design choices that keep masking fast
+
+| Technique | Impact |
+|-----------|--------|
+| Precomputed token analysis | Terminal matches computed once at startup |
+| Mask caching by grammar state signature | Reuse masks for repeated parser states |
+| Partitioned tokens | Exact matches separated from DP candidates |
+
+### Comparison Notes
+
+- **llama.cpp**: Decodes each token to UTF-8, checks against PDA. No caching.
+- **Outlines**: FSM-based. Compilation can take 40s-10min for complex schemas. Fast after compile.
+- **XGrammar**: PDA with 99% context-independent tokens precomputed. State-of-the-art before this.
+- **x/grammar**: Precomputed token analysis + mask caching by grammar state signature.
+
+## Usage
+
+```go
+import (
+    "github.com/ollama/ollama/x/grammar"
+    "github.com/ollama/ollama/x/grammar/schema"
+)
+
+// Use built-in JSON grammar
+g, _ := grammar.JSONGrammar()
+
+// Or from JSON Schema (OpenAI-compatible)
+g, _ := schema.Grammar(`{
+    "type": "object",
+    "properties": {
+        "name": {"type": "string"},
+        "age": {"type": "integer"}
+    },
+    "required": ["name", "age"]
+}`)
+
+// Or parse custom EBNF
+g, _ := grammar.ParseEBNF(myGrammar, "root")
+
+// Create engine with model vocabulary
+engine, _ := grammar.NewEngine(g, vocab)
+defer engine.Close()
+
+// Generation loop
+for !engine.IsComplete() {
+    logits := model.Forward(tokens)
+    masked := engine.ApplyMask(logits)  // Invalid tokens → -inf
+    nextToken := sample(masked)
+    engine.Accept(nextToken)
+}
+// Output conforms to the grammar when you only sample from masked tokens and call Accept
+```
+
+## EBNF Syntax
+
+```ebnf
+rule = expression .           # Rule definition (ends with .)
+"literal"                      # Literal string
+"a" … "z"                      # Character range (inclusive)
+( a | b )                      # Grouping with alternation
+[ optional ]                   # Optional (0 or 1)
+{ repeated }                   # Repetition (0 or more)
+```
+
+### Example: JSON Grammar
+
+```ebnf
+json = value .
+
+value = object | array | string | number | "true" | "false" | "null" .
+
+object = "{" ws "}" | "{" members "}" .
+members = member { "," member } .
+member = ws string ws ":" element .
+
+array = "[" ws "]" | "[" elements "]" .
+elements = element { "," element } .
+element = ws value ws .
+
+string = "\"" { character } "\"" .
+character = unescaped | escaped .
+unescaped = " " | "!" | "#" … "[" | "]" … "~" .
+escaped = "\\" ( "\"" | "\\" | "/" | "b" | "f" | "n" | "r" | "t" ) .
+
+number = [ "-" ] integer [ fraction ] [ exponent ] .
+integer = "0" | onenine { digit } .
+fraction = "." digit { digit } .
+exponent = ( "e" | "E" ) [ "+" | "-" ] digit { digit } .
+digit = "0" … "9" .
+onenine = "1" … "9" .
+
+ws = { " " | "\t" | "\n" | "\r" } .
+```
+
+### Example: Custom Schema
+
+```ebnf
+root = "{" ws name_field "," ws age_field ws "}" .
+
+name_field = "\"name\"" ws ":" ws string .
+age_field = "\"age\"" ws ":" ws number .
+
+string = "\"" { char } "\"" .
+char = " " | "!" | "#" … "~" .
+
+number = [ "-" ] digit { digit } .
+digit = "0" … "9" .
+
+ws = { " " | "\n" } .
+```
+
+## JSON Schema Support
+
+OpenAI-compatible JSON Schema support with automatic EBNF generation:
+
+```go
+schema := `{
+    "type": "object",
+    "properties": {
+        "user": {"$ref": "#/$defs/User"}
+    },
+    "required": ["user"],
+    "$defs": {
+        "User": {
+            "type": "object",
+            "properties": {
+                "name": {"type": "string"},
+                "email": {"type": "string", "format": "email"},
+                "role": {"enum": ["admin", "user", "guest"]}
+            },
+            "required": ["name", "email", "role"]
+        }
+    }
+}`
+
+grammar, _ := schema.Grammar(schema)
+```
+
+### Supported Features
+
+| Feature | Example |
+|---------|---------|
+| Basic types | `string`, `integer`, `number`, `boolean`, `null` |
+| Objects | `properties`, `required` |
+| Arrays | `items`, `minItems`, `maxItems` |
+| Enums | `enum: ["a", "b", "c"]` |
+| Constants | `const: "value"` |
+| Union types | `anyOf`, `oneOf`, `type: ["string", "null"]` |
+| References | `$ref: "#/$defs/Name"`, `$defs` |
+| Formats | `date`, `time`, `date-time`, `email`, `uuid`, `ipv4` |
+
+## Benchmarks
+
+```bash
+# Run all tests
+go test -tags mlx ./x/grammar/...
+
+# Run benchmarks
+go test -tags mlx ./x/grammar/ -bench=.
+
+# Compare with llama.cpp (outputs JSON)
+go run -tags mlx ./x/grammar/cmd/compare -vocab-size 128000 -iterations 500
+
+# Compare with a more complex schema
+go run -tags mlx ./x/grammar/cmd/compare \
+  -gbnf x/grammar/cmd/compare/complex.gbnf \
+  -schema x/grammar/cmd/compare/complex.schema.json \
+  -vocab-size 128000 -iterations 500
+```
+
+## References
+
+- [XGrammar Paper](https://arxiv.org/abs/2411.15100) - Flexible and Efficient Structured Generation
+- [Outlines](https://github.com/dottxt-ai/outlines) - Structured Text Generation
+- [JSONSchemaBench](https://arxiv.org/abs/2501.10868) - Benchmark for Structured Outputs