ollama-ollama

mirror of https://github.com/ollama/ollama.git synced 2026-04-17 15:53:27 +02:00

Files

Jesse Gross bb0c58e134 ggml: skip cublasGemmBatchedEx during graph reservation

cublasGemmBatchedEx fails during graph capture when pool allocations
return fake pointers. This is triggered when NUM_PARALLEL is greater
than 1 for models like gemma4 that use batched matmuls. Skip it
during reservation since the memory tracking is already handled by
the pool allocations.

Fixes #15249

2026-04-03 12:41:09 -07:00

backend

ggml: skip cublasGemmBatchedEx during graph reservation

2026-04-03 12:41:09 -07:00

fix: qwen2.5 vl rope (#13486 )

2025-12-15 17:30:33 -08:00

backend.go

Add support for gemma4 (#15214 )

2026-04-02 11:33:33 -07:00

device.go

flash attn: add auto mode for llama engine (#13052 )

2025-12-12 13:27:19 -08:00

path.go

cpu: always ensure LibOllamaPath included (#12890 )

2025-10-31 14:37:29 -07:00