mirror of
https://github.com/ollama/ollama.git
synced 2026-04-17 15:53:27 +02:00
The scheduler manages prefill and decode for concurrent requests. A fixed pool of sequence slots avoids cache rebuilds during normal operation. New requests prefill inline while existing sequences' decode is paused, then all active sequences resume in a single batched forward pass. Cache state is materialized before transitions to ensure consistency.