ollama

mirror of https://github.com/ollama/ollama.git synced 2026-04-18 01:54:17 +02:00

Files

Jesse Gross a16f96658b mlxrunner: Enforce model context limit

Currently, context length is unbounded - the cache will keep
growing forever independent of the model's trained context
length. This caps it and enforces semantics similar to most
cloud services:
 - Long prompts will result in an error, not truncation.
 - Generation that exceeds the context will be stopped

2026-02-27 17:29:47 -08:00

gemma3

mlxrunner: Enforce model context limit

2026-02-27 17:29:47 -08:00

glm4_moe_lite

mlxrunner: Enforce model context limit

2026-02-27 17:29:47 -08:00

llama

mlxrunner: Enforce model context limit

2026-02-27 17:29:47 -08:00

Add MLX runner with GLM4-MoE-Lite model support (#14185 )

2026-02-10 14:57:57 -08:00

qwen3

mlxrunner: Enforce model context limit

2026-02-27 17:29:47 -08:00