* integration: improve ability to test individual models
Add OLLAMA_TEST_MODEL env var to run integration tests against a
single model.
Enhance vision tests: multi-turn chat with cached image tokens, object
counting, spatial reasoning, detail recognition, scene understanding, OCR, and
multi-image comparison.
Add tool calling stress tests with complex agent-style prompts, large
system messages, and multi-turn tool response handling.
* review comments
* tests: add single threaded history test
Also tidies up some existing tests to handle more model output variation
* test: add support for testing specific architectures