mirror of
https://github.com/ollama/ollama.git
synced 2026-04-17 15:53:27 +02:00
bench: add prompt calibration, context size flag, and NumCtx reporting (#15158)
Add --num-ctx flag to set context size, and report NumCtx in model info header. Calibrate tokens-per-word ratio during warmup using actual tokenization metrics from the model, replacing the fixed 1.3 heuristic. This produces more accurate prompt token counts for --prompt-tokens. Also add fetchContextLength() to query running model context via /api/ps.
This commit is contained in: