Files
ollama-ollama/llm/server.go
Daniel Hiltgen dde09129d1 gemma4: Disable FA on older GPUs where it doesn't work (#15403)
CUDA older than 7.5 lack the support to enable flash attention for the model.
2026-04-07 14:54:25 -07:00

56 KiB