mirror of
https://github.com/ollama/ollama.git
synced 2026-04-23 09:15:44 +02:00
Implement dual-limit tool output truncation to prevent context overflow: - 4k tokens (~16k chars) for local models on local servers - 10k tokens (~40k chars) for cloud models or remote servers This helps preserve context window for local models with smaller context windows while allowing larger outputs for cloud services.