bench: add prompt calibration, context size flag, and NumCtx reporting (#15158)

mirror of https://github.com/ollama/ollama.git synced 2026-04-17 15:53:27 +02:00

Add --num-ctx flag to set context size, and report NumCtx in model info
header. Calibrate tokens-per-word ratio during warmup using actual
tokenization metrics from the model, replacing the fixed 1.3 heuristic.
This produces more accurate prompt token counts for --prompt-tokens.

Also add fetchContextLength() to query running model context via /api/ps.

This commit is contained in:

Daniel Hiltgen

2026-04-02 14:23:53 -07:00

committed by

GitHub

parent de9673ac3f

commit 3536ef58f6

bench: add prompt calibration, context size flag, and NumCtx reporting (#15158)

Diff Content Not Available