ollama

mirror of https://github.com/ollama/ollama.git synced 2026-04-27 11:15:40 +02:00

Files

jmorganca f1bd1a25ac llama/compat: add glm-ocr clip handler (glm4v projector)

GLM-OCR ships a vision tower whose Ollama-format names already mostly
match upstream's PROJECTOR_TYPE_GLM4V expectations:

  v.blk.X.attn_qkv / attn_out / attn_q_norm / attn_k_norm    ✓
  v.blk.X.ln1 / ln2 / ffn_{gate,up,down}                     ✓
  mm.model.fc, mm.up, mm.gate, mm.down,
  mm.post_norm, mm.patch_merger                              ✓

Two diffs to fix:
  * Patch-embed temporal pair: Ollama uses `v.patch_embd_0.weight` /
    `v.patch_embd_1.weight` (underscore-suffixed); upstream uses
    `v.patch_embd.weight` (unsuffixed) + `v.patch_embd.weight.1`
    (with TN_PATCH_EMBD_1). Exact rename for both.
  * F32 promote of `v.patch_embd.weight{,.1}` for Metal IM2COL
    (same fix as gemma3 / mistral3 / deepseek-ocr).

KV synthesis: rewrite arch to `clip` with `projector_type=glm4v`,
copy `glmocr.vision.*` → `clip.vision.*` (incl. spatial_merge_size,
out_hidden_size → projection_dim, intermediate_size →
feed_forward_length, layer_norm_rms_epsilon →
attention.layer_norm_epsilon), copy through Ollama's `image_mean` /
`image_std` arrays, set `clip.use_silu=true`.

Adds `glmocr` to the Go-side `compatClipArches` allowlist.

Verified: with --mmproj pointing at the same blob, mmproj loads
cleanly with all glm4v hparams set and patch_embd promoted to F32.
End-to-end testing through `ollama run glm-ocr` (which supplies the
proper chat template via Modelfile) is the user-facing flow.

2026-04-20 09:30:26 -07:00

llama_server_test.go

runner: Remove CGO engines, use llama-server exclusively for GGML models

2026-04-20 08:44:02 -07:00

llama_server.go

llama/compat: add glm-ocr clip handler (glm4v projector)

2026-04-20 09:30:26 -07:00

llm_darwin.go

Optimize container images for startup (#6547 )

2024-09-12 12:10:30 -07:00

llm_linux.go

Optimize container images for startup (#6547 )