bench: add prompt calibration, context size flag, and NumCtx reporting (#15158)

Add --num-ctx flag to set context size, and report NumCtx in model info
header. Calibrate tokens-per-word ratio during warmup using actual
tokenization metrics from the model, replacing the fixed 1.3 heuristic.
This produces more accurate prompt token counts for --prompt-tokens.

Also add fetchContextLength() to query running model context via /api/ps.
This commit is contained in:
Daniel Hiltgen
2026-04-02 14:23:53 -07:00
committed by GitHub
parent de9673ac3f
commit 3536ef58f6

Diff Content Not Available