ollama

mirror of https://github.com/ollama/ollama.git synced 2026-04-17 21:54:08 +02:00

Files

Daniel Hiltgen 356c0b8e34 gemma4: add audio support with USM conformer encoder

Add audio encoding for Gemma 4 using the USM conformer architecture:
- Converter: audio tensor mapping, SSCP/conformer/embedder name replacements,
  softplus repacker for per_dim_scale, F32 enforcement for conv weights
- GGML backend: Conv1DDW and PadExt tensor ops
- Audio encoder: SSCP Conv2D, 12 conformer blocks (FFW + block-local
  attention with relative position embeddings + LightConv1d + FFW),
  output projection, audio-to-text embedding projector
- Audio preprocessing: WAV decode, mel spectrogram, FFT (pure Go)
- Model wiring: WAV detection, audio token handling, unified PostTokenize

Correctly transcribes "why is the sky blue" from test audio.

2026-04-01 15:24:17 -07:00

backend

gemma4: add audio support with USM conformer encoder

2026-04-01 15:24:17 -07:00

fix: qwen2.5 vl rope (#13486 )

2025-12-15 17:30:33 -08:00

backend.go

gemma4: add audio support with USM conformer encoder

2026-04-01 15:24:17 -07:00

device.go

flash attn: add auto mode for llama engine (#13052 )

2025-12-12 13:27:19 -08:00

path.go

cpu: always ensure LibOllamaPath included (#12890 )

2025-10-31 14:37:29 -07:00