mirror of
https://github.com/ollama/ollama.git
synced 2026-04-18 01:54:17 +02:00
Currently, context length is unbounded - the cache will keep growing forever independent of the model's trained context length. This caps it and enforces semantics similar to most cloud services: - Long prompts will result in an error, not truncation. - Generation that exceeds the context will be stopped