jmorganca db0c745308 llama/compat: add qwen35moe vision (clip) support
Extends the compat layer with the vision side for Ollama's monolithic
qwen3.5 blobs. All changes in llama/compat/ — no new upstream patch edits.

New generic infra (reused by gemma3's existing promotion):
  - LoadOp registry (g_loadops). Any dest tensor whose name is registered
    gets its bytes produced by a closure instead of being read straight
    from disk. maybe_load_tensor consults it.
  - promote_tensor_to_f32(meta, ctx, name) now captures the source offset
    at registration time and becomes a LoadOp. Gemma3 already migrated.
  - register_concat_load(meta, dest, {srcs...}) captures the file offsets
    of N source tensors and registers a LoadOp that concatenates them.
    Assumes sources concatenate along their slowest ggml axis — which in
    C order means the dest bytes are src[0] || src[1] || ... .
  - set_tensor_shape / set_tensor_type helpers for in-place edits.

qwen35moe clip handler (handle_qwen35moe_clip):
  - Detection reuses detect_ollama_qwen35moe; additionally requires
    embedded v.* tensors so we don't fire for text-only files.
  - KV synth: clip.vision.* from qwen35moe.vision.* + sensible defaults
    (feed_forward_length=4304, image_size=768, layer_norm_epsilon=1e-6,
    is_deepstack_layers=false[27], image_mean/std=[0.5,0.5,0.5]).
  - Arch rewrite: general.architecture=clip, projector_type=qwen3vl_merger.
  - QKV merge per block (27x): captures q/k/v file offsets, registers a
    concat LoadOp, renames attn_q -> attn_qkv and widens its shape from
    [hidden, hidden] to [hidden, 3*hidden].
  - patch_embed split: source [16,16,2,3456] F16 -> two dests
    [16,16,3,1152] F32, permuting (c_out*3+c_in) packed_c back into
    separate c_in/c_out dims. Matches upstream convert_hf's
    Qwen3VLVisionModel.modify_tensors split.
  - Tensor renames (substring-matched): pos_embed -> position_embd,
    merger.norm -> post_ln, merger.linear_fc1/2 -> mm.0/mm.2,
    mlp.linear_fc1/2 -> ffn_up/ffn_down, norm1/2 -> ln1/ln2.
  - F16 -> F32 promote for v.position_embd.weight.

Ctx-pool trick for the sibling tensor:
  clip.cpp sizes its ggml_context for exactly the gguf's tensor count
  (+1). ggml_new_tensor to add v.patch_embd.weight.1 overflows. Since
  v.blk.0.attn_k.weight is orphaned after the QKV merge (clip only
  requests the merged attn_qkv), steal that slot: rename it to
  v.patch_embd.weight.1 and reshape to [16,16,3,1152] F32. Its original
  file offset is ignored; the LoadOp we register overrides the read.

Go side: adds qwen35moe to the auto-mmproj arch allowlist. ollama now
passes the monolithic blob as both --model and --mmproj for qwen3.5.

Verified end-to-end: ollama run qwen3.5:35b-a3b-q4_K_M with an image
correctly describes the image ("screenshot of a chat interface...
'open the browser, open never gonna give you up on youtube'..."). Text
inference still works on the same blob.
2026-04-20 09:29:34 -07:00
2026-04-02 11:33:33 -07:00
2026-04-17 14:20:59 -07:00
2025-11-19 17:21:07 -08:00
2026-04-02 11:33:33 -07:00
2026-04-02 11:33:33 -07:00
2023-08-22 09:40:58 -07:00
2025-01-29 15:03:38 -08:00
2023-06-26 15:57:13 -04:00
2024-08-01 17:06:06 -07:00
2026-03-23 11:28:44 -07:00
2026-03-23 11:28:44 -07:00

ollama

Ollama

Start building with open models.

Download

macOS

curl -fsSL https://ollama.com/install.sh | sh

or download manually

Windows

irm https://ollama.com/install.ps1 | iex

or download manually

Linux

curl -fsSL https://ollama.com/install.sh | sh

Manual install instructions

Docker

The official Ollama Docker image ollama/ollama is available on Docker Hub.

Libraries

Community

Get started

ollama

You'll be prompted to run a model or connect Ollama to your existing agents or applications such as Claude Code, OpenClaw, OpenCode , Codex, Copilot, and more.

Coding

To launch a specific integration:

ollama launch claude

Supported integrations include Claude Code, Codex, Copilot CLI, Droid, and OpenCode.

AI assistant

Use OpenClaw to turn Ollama into a personal AI assistant across WhatsApp, Telegram, Slack, Discord, and more:

ollama launch openclaw

Chat with a model

Run and chat with Gemma 3:

ollama run gemma3

See ollama.com/library for the full list.

See the quickstart guide for more details.

REST API

Ollama has a REST API for running and managing models.

curl http://localhost:11434/api/chat -d '{
  "model": "gemma3",
  "messages": [{
    "role": "user",
    "content": "Why is the sky blue?"
  }],
  "stream": false
}'

See the API documentation for all endpoints.

Python

pip install ollama
from ollama import chat

response = chat(model='gemma3', messages=[
  {
    'role': 'user',
    'content': 'Why is the sky blue?',
  },
])
print(response.message.content)

JavaScript

npm i ollama
import ollama from "ollama";

const response = await ollama.chat({
  model: "gemma3",
  messages: [{ role: "user", content: "Why is the sky blue?" }],
});
console.log(response.message.content);

Supported backends

  • llama.cpp project founded by Georgi Gerganov.

Documentation

Community Integrations

Want to add your project? Open a pull request.

Chat Interfaces

Web

Desktop

  • Dify.AI - LLM app development platform
  • AnythingLLM - All-in-one AI app for Mac, Windows, and Linux
  • Maid - Cross-platform mobile and desktop client
  • Witsy - AI desktop app for Mac, Windows, and Linux
  • Cherry Studio - Multi-provider desktop client
  • Ollama App - Multi-platform client for desktop and mobile
  • PyGPT - AI desktop assistant for Linux, Windows, and Mac
  • Alpaca - GTK4 client for Linux and macOS
  • SwiftChat - Cross-platform including iOS, Android, and Apple Vision Pro
  • Enchanted - Native macOS and iOS client
  • RWKV-Runner - Multi-model desktop runner
  • Ollama Grid Search - Evaluate and compare models
  • macai - macOS client for Ollama and ChatGPT
  • AI Studio - Multi-provider desktop IDE
  • Reins - Parameter tuning and reasoning model support
  • ConfiChat - Privacy-focused with optional encryption
  • LLocal.in - Electron desktop client
  • MindMac - AI chat client for Mac
  • Msty - Multi-model desktop client
  • BoltAI for Mac - AI chat client for Mac
  • IntelliBar - AI-powered assistant for macOS
  • Kerlig AI - AI writing assistant for macOS
  • Hillnote - Markdown-first AI workspace
  • Perfect Memory AI - Productivity AI personalized by screen and meeting history

Mobile

SwiftChat, Enchanted, Maid, Ollama App, Reins, and ConfiChat listed above also support mobile platforms.

Code Editors & Development

Libraries & SDKs

Frameworks & Agents

RAG & Knowledge Bases

  • RAGFlow - RAG engine based on deep document understanding
  • R2R - Open-source RAG engine
  • MaxKB - Ready-to-use RAG chatbot
  • Minima - On-premises or fully local RAG
  • Chipper - AI interface with Haystack RAG
  • ARGO - RAG and deep research on Mac/Windows/Linux
  • Archyve - RAG-enabling document library
  • Casibase - AI knowledge base with RAG and SSO
  • BrainSoup - Native client with RAG and multi-agent automation

Bots & Messaging

Terminal & CLI

Productivity & Apps

Observability & Monitoring

  • Opik - Debug, evaluate, and monitor LLM applications
  • OpenLIT - OpenTelemetry-native monitoring for Ollama and GPUs
  • Lunary - LLM observability with analytics and PII masking
  • Langfuse - Open source LLM observability
  • HoneyHive - AI observability and evaluation for agents
  • MLflow Tracing - Open source LLM observability

Database & Embeddings

Infrastructure & Deployment

Cloud

Package Managers

Description
No description provided
Readme MIT 403 MiB
Languages
Go 82.4%
C 9.4%
TypeScript 4.4%
C++ 1.5%
Objective-C 0.6%
Other 1.6%