ollama

mirror of https://github.com/ollama/ollama.git synced 2026-04-28 03:39:48 +02:00

Files

jmorganca a23a5e76f3 llama/compat: fix gemma4a per-block norm tensor mapping

The previous mapping renamed Ollama's `layer_pre_norm` to upstream's
`attn_post_norm`, but in Ollama's gemma4 audio code `layer_pre_norm` is
actually the FINAL block output norm (used at the end of the conformer
block), while `ln2` (named AttnPostNorm in Ollama's Go) is the
post-attention norm.

Upstream gemma4a uses `attn_post_norm` for post-attention and `ln_2`
for the final block norm, so the correct mapping is:

  ln1            → attn_pre_norm    (pre-attention norm)
  ln2            → attn_post_norm   (post-attention norm)
  layer_pre_norm → ln2              (final block output norm)
  linear_pos     → attn_k_rel       (relative-position K projection)

Also scope the per-block renames to `a.blk.*` (not vision blocks, which
have their own ln1/ln2 that must stay as-is). Order matters: ln2 →
attn_post_norm must run before layer_pre_norm → ln2 to avoid collision.

Verified end-to-end via `ollama run gemma4:e2b` with audio input — now
produces transcription that matches the upstream-converted reference
mmproj.

2026-04-20 09:30:26 -07:00

compat

llama/compat: fix gemma4a per-block norm tensor mapping

2026-04-20 09:30:26 -07:00

patches

runner: Remove CGO engines, use llama-server exclusively for GGML models

2026-04-20 08:44:02 -07:00

server

llama/compat: split shared infra into a util TU

2026-04-20 09:29:34 -07:00

.gitignore

Re-introduce the llama package (#5034 )

2024-10-08 08:53:54 -07:00