cmd: add eval command for lightweight model evals

2026-04-18 22:54:13 +02:00 · 2025-11-28 19:38:13 -05:00
parent 412954c452
commit d96fb7deb3
4 changed files with 596 additions and 0 deletions
--- a/cmd/eval/README.md
+++ b/cmd/eval/README.md
@@ -0,0 +1,50 @@
+# eval
+
+Evaluation tool for testing Ollama models.
+
+## Usage
+
+Run all tests:
+
+```bash
+go run . -model llama3.2:latest
+```
+
+Run specific suite:
+
+```bash
+go run . -model llama3.2:latest -suite tool-calling-basic -v
+```
+
+List available suites:
+
+```bash
+go run . -list
+```
+
+## Adding Tests
+
+Edit `suites.go` to add new test suites. Each test needs:
+
+- `Name`: test identifier
+- `Prompt`: what to send to the model
+- `Check`: function to validate the response
+
+Example:
+
+```go
+{
+    Name:   "my-test",
+    Prompt: "What is 2+2?",
+    Check:  Contains("4"),
+}
+```
+
+Available check functions:
+
+- `HasResponse()` - response is non-empty
+- `Contains(s)` - response contains substring
+- `CallsTool(name)` - model called specific tool
+- `NoTools()` - model called no tools
+- `MinTools(n)` - model called at least n tools
+- `All(checks...)` - all checks pass