ollama-ollama

mirror of https://github.com/ollama/ollama.git synced 2026-04-17 15:53:27 +02:00

Files

Jesse Gross f474a632ab mlxrunner: tokenize prompts in request handler goroutines

Move tokenization out of the single GPU processing goroutine and
into each request's HTTP handler goroutine. This allows the next
request's prompt to be tokenized on the CPU while the current
request is executing on the GPU.

2026-04-03 16:34:22 -07:00

agent

x/cmd: enable web search and web fetch with flag (#13690 )

2026-01-12 13:59:40 -08:00

cmd

Reapply "don't require pulling stubs for cloud models" again (#14608 )

2026-03-06 14:27:47 -08:00

create

mlx: fix vision capability + min version (#15106 )

2026-03-27 17:09:28 -07:00

imagegen

ci: fix windows cgo compiler error (#15046 )

2026-03-24 16:45:36 -07:00

mlxrunner

mlxrunner: tokenize prompts in request handler goroutines

2026-04-03 16:34:22 -07:00

models

mlx: add mxfp4/mxfp8/nvfp4 importing (#15015 )

2026-03-24 13:45:44 -07:00

server

mlx: fix vision capability + min version (#15106 )

2026-03-27 17:09:28 -07:00

tokenizer

mlx: quantized embeddings, fast SwiGLU, and runtime fixes (#14884 )

2026-03-17 11:21:38 -07:00

tools

add ability to disable cloud (#14221 )

2026-02-12 15:47:00 -08:00