Daniel Hiltgen
356c0b8e34
gemma4: add audio support with USM conformer encoder
Add audio encoding for Gemma 4 using the USM conformer architecture:
- Converter: audio tensor mapping, SSCP/conformer/embedder name replacements,
softplus repacker for per_dim_scale, F32 enforcement for conv weights
- GGML backend: Conv1DDW and PadExt tensor ops
- Audio encoder: SSCP Conv2D, 12 conformer blocks (FFW + block-local
attention with relative position embeddings + LightConv1d + FFW),
output projection, audio-to-text embedding projector
- Audio preprocessing: WAV decode, mel spectrogram, FFT (pure Go)
- Model wiring: WAV detection, audio token handling, unified PostTokenize
Correctly transcribes "why is the sky blue" from test audio.
2026-04-01 15:24:17 -07:00
..
2025-11-15 20:22:29 -08:00
2026-01-16 14:10:36 -05:00
2026-01-16 14:10:36 -05:00
2026-03-17 13:30:17 -07:00
2026-03-17 13:30:17 -07:00
2026-01-21 11:46:17 -08:00
2026-04-01 15:23:10 -07:00
2026-01-21 11:46:17 -08:00
2024-03-14 20:18:06 -07:00
2024-03-14 20:18:06 -07:00
2026-03-27 15:42:31 -07:00
2026-03-27 15:42:31 -07:00
2026-03-19 17:08:17 -07:00
2025-11-13 13:49:25 -08:00
2026-03-06 14:27:47 -08:00
2026-03-06 14:27:47 -08:00
2026-01-21 11:46:17 -08:00
2026-03-03 12:51:34 -08:00
2026-02-27 17:29:47 -08:00
2026-02-24 20:08:05 -08:00
2026-04-01 15:24:17 -07:00
2026-03-28 16:43:59 -07:00
2026-03-06 14:27:47 -08:00
2026-03-16 17:40:29 -07:00
2026-03-06 14:27:47 -08:00
2026-03-16 17:40:29 -07:00
2026-03-16 17:40:29 -07:00
2026-03-16 17:40:29 -07:00
2024-12-31 18:02:30 -08:00
2026-02-02 10:47:09 -08:00
2026-03-19 17:08:17 -07:00
2026-03-27 17:09:28 -07:00
2026-03-10 15:53:25 -07:00
2026-03-28 16:43:59 -07:00
2026-03-16 17:40:29 -07:00
2026-03-16 17:40:29 -07:00
2024-08-09 12:16:19 -07:00
2024-08-09 12:16:19 -07:00
2026-02-12 15:47:00 -08:00
2026-01-21 11:46:17 -08:00