ollama-ollama

mirror of https://github.com/ollama/ollama.git synced 2026-04-17 15:53:27 +02:00

Files

Jesse Gross 2beb5445a4 mlxrunner: replace TextGenerationPipeline with scheduler

The scheduler manages prefill and decode for concurrent requests. A
fixed pool of sequence slots avoids cache rebuilds during normal
operation. New requests prefill inline while existing sequences' decode
is paused, then all active sequences resume in a single batched forward
pass. Cache state is materialized before transitions to ensure
consistency.

2026-04-03 20:03:32 -07:00

agent

x/cmd: enable web search and web fetch with flag (#13690 )

2026-01-12 13:59:40 -08:00

cmd

Reapply "don't require pulling stubs for cloud models" again (#14608 )

2026-03-06 14:27:47 -08:00

create

mlx: fix vision capability + min version (#15106 )

2026-03-27 17:09:28 -07:00

imagegen

ci: fix windows cgo compiler error (#15046 )