Files
Jesse Gross 2beb5445a4 mlxrunner: replace TextGenerationPipeline with scheduler
The scheduler manages prefill and decode for concurrent requests. A
fixed pool of sequence slots avoids cache rebuilds during normal
operation. New requests prefill inline while existing sequences' decode
is paused, then all active sequences resume in a single batched forward
pass. Cache state is materialized before transitions to ensure
consistency.
2026-04-03 20:03:32 -07:00
..