mirror of
https://github.com/ollama/ollama.git
synced 2026-04-27 03:05:43 +02:00
Two QwenVL families that ship with monolithic vision+text Ollama GGUFs.
qwen2.5-vl (text + clip):
* Text: arch translation qwen25vl→qwen2vl, KV prefix copy, mrope_section
(3 elems) → rope.dimension_sections (4 elems, padded with 0).
* Clip: tensor renames (v.merger.* → v.post_ln/mm.0/mm.2,
v.patch_embd_{0,1} → v.patch_embd.weight{,.1}), use_silu, derive
n_wa_pattern from fullatt_block_indexes[0]+1, image_size=560 default,
standard CLIP image_mean/std, F32 promote of patch_embd for Metal.
qwen3-vl (text + clip):
* Text: inject qwen3vl.rope.dimension_sections=[24,20,20,0] (Qwen3-VL-8B
HF default — head_dim=128, sum=64) and qwen3vl.n_deepstack_layers
derived from deepstack_visual_indexes array length.
* Clip: per-block QKV merge (Ollama stores separate Q/K/V; upstream
qwen3vl_merger graph reads combined attn_qkv); deepstack remap
v.deepstack_merger.X.* → v.deepstack.{indexes[X]}.* (with
linear_fc{1,2} → fc{1,2}); merger renames matching upstream LLaVA
proj layout (v.post_ln + mm.0/mm.2); patch_embed split (16x16x2
Conv3D → two 16x16 Conv2Ds, F16→F32); per-block substring renames
(norm1/2 → ln1/2, mlp.linear_fc1/2 → ffn_up/down); F32 promote of
position_embd.
The qwen3-vl QKV merge needed a new util helper
(register_concat_load_to_f32) because the Ollama blob mixes types within
a single block (F16 Q/K + Q8_0 V) — the existing byte-concat
register_concat_load only works when all sources share a type. The new
helper dequantizes each source to F32 via ggml_get_type_traits->to_float
and concatenates; caller sets the destination tensor type to F32.
Verified end-to-end via `ollama run`:
* qwen2.5vl: image of NYT moon-landing front page → correct caption.
* qwen3-vl: same image → "New York Times front page from July 21,
1969, headlines the Apollo 11 moon landing with 'MEN WALK ON MOON'
and details of astronauts collecting rocks and planting a flag."
Also added qwen25vl + qwen3vl to the compatClipArches allowlist in
llm/llama_server.go so the auto-mmproj path activates for these
monolithic blobs (same mechanism as gemma3/4/qwen35moe/etc.).