mirror of
https://github.com/ollama/ollama.git
synced 2026-04-17 15:53:27 +02:00
gemma4: enable flash attention (#15378)
Backport GGML kernels so we can enable flash attention for the gemma 4 model on Metal and CUDA.
This commit is contained in:
@@ -890,6 +890,7 @@ func (f GGML) FlashAttention() bool {
|
||||
return slices.Contains([]string{
|
||||
"bert",
|
||||
"gemma3",
|
||||
"gemma4",
|
||||
"glm4moelite",
|
||||
"glmocr",
|
||||
"gptoss", "gpt-oss",
|
||||
|
||||
Reference in New Issue
Block a user