mirror of
https://github.com/ollama/ollama.git
synced 2026-04-27 11:15:40 +02:00
GLM-OCR ships a vision tower whose Ollama-format names already mostly
match upstream's PROJECTOR_TYPE_GLM4V expectations:
v.blk.X.attn_qkv / attn_out / attn_q_norm / attn_k_norm ✓
v.blk.X.ln1 / ln2 / ffn_{gate,up,down} ✓
mm.model.fc, mm.up, mm.gate, mm.down,
mm.post_norm, mm.patch_merger ✓
Two diffs to fix:
* Patch-embed temporal pair: Ollama uses `v.patch_embd_0.weight` /
`v.patch_embd_1.weight` (underscore-suffixed); upstream uses
`v.patch_embd.weight` (unsuffixed) + `v.patch_embd.weight.1`
(with TN_PATCH_EMBD_1). Exact rename for both.
* F32 promote of `v.patch_embd.weight{,.1}` for Metal IM2COL
(same fix as gemma3 / mistral3 / deepseek-ocr).
KV synthesis: rewrite arch to `clip` with `projector_type=glm4v`,
copy `glmocr.vision.*` → `clip.vision.*` (incl. spatial_merge_size,
out_hidden_size → projection_dim, intermediate_size →
feed_forward_length, layer_norm_rms_epsilon →
attention.layer_norm_epsilon), copy through Ollama's `image_mean` /
`image_std` arrays, set `clip.use_silu=true`.
Adds `glmocr` to the Go-side `compatClipArches` allowlist.
Verified: with --mmproj pointing at the same blob, mmproj loads
cleanly with all glm4v hparams set and patch_embd promoted to F32.
End-to-end testing through `ollama run glm-ocr` (which supplies the
proper chat template via Modelfile) is the user-facing flow.