Extends the compat layer with the vision side for Ollama's monolithic
qwen3.5 blobs. All changes in llama/compat/ — no new upstream patch edits.
New generic infra (reused by gemma3's existing promotion):
- LoadOp registry (g_loadops). Any dest tensor whose name is registered
gets its bytes produced by a closure instead of being read straight
from disk. maybe_load_tensor consults it.
- promote_tensor_to_f32(meta, ctx, name) now captures the source offset
at registration time and becomes a LoadOp. Gemma3 already migrated.
- register_concat_load(meta, dest, {srcs...}) captures the file offsets
of N source tensors and registers a LoadOp that concatenates them.
Assumes sources concatenate along their slowest ggml axis — which in
C order means the dest bytes are src[0] || src[1] || ... .
- set_tensor_shape / set_tensor_type helpers for in-place edits.
qwen35moe clip handler (handle_qwen35moe_clip):
- Detection reuses detect_ollama_qwen35moe; additionally requires
embedded v.* tensors so we don't fire for text-only files.
- KV synth: clip.vision.* from qwen35moe.vision.* + sensible defaults
(feed_forward_length=4304, image_size=768, layer_norm_epsilon=1e-6,
is_deepstack_layers=false[27], image_mean/std=[0.5,0.5,0.5]).
- Arch rewrite: general.architecture=clip, projector_type=qwen3vl_merger.
- QKV merge per block (27x): captures q/k/v file offsets, registers a
concat LoadOp, renames attn_q -> attn_qkv and widens its shape from
[hidden, hidden] to [hidden, 3*hidden].
- patch_embed split: source [16,16,2,3456] F16 -> two dests
[16,16,3,1152] F32, permuting (c_out*3+c_in) packed_c back into
separate c_in/c_out dims. Matches upstream convert_hf's
Qwen3VLVisionModel.modify_tensors split.
- Tensor renames (substring-matched): pos_embed -> position_embd,
merger.norm -> post_ln, merger.linear_fc1/2 -> mm.0/mm.2,
mlp.linear_fc1/2 -> ffn_up/ffn_down, norm1/2 -> ln1/ln2.
- F16 -> F32 promote for v.position_embd.weight.
Ctx-pool trick for the sibling tensor:
clip.cpp sizes its ggml_context for exactly the gguf's tensor count
(+1). ggml_new_tensor to add v.patch_embd.weight.1 overflows. Since
v.blk.0.attn_k.weight is orphaned after the QKV merge (clip only
requests the merged attn_qkv), steal that slot: rename it to
v.patch_embd.weight.1 and reshape to [16,16,3,1152] F32. Its original
file offset is ignored; the LoadOp we register overrides the read.
Go side: adds qwen35moe to the auto-mmproj arch allowlist. ollama now
passes the monolithic blob as both --model and --mmproj for qwen3.5.
Verified end-to-end: ollama run qwen3.5:35b-a3b-q4_K_M with an image
correctly describes the image ("screenshot of a chat interface...
'open the browser, open never gonna give you up on youtube'..."). Text
inference still works on the same blob.
Ollama
Start building with open models.
Download
macOS
curl -fsSL https://ollama.com/install.sh | sh
Windows
irm https://ollama.com/install.ps1 | iex
Linux
curl -fsSL https://ollama.com/install.sh | sh
Docker
The official Ollama Docker image ollama/ollama is available on Docker Hub.
Libraries
Community
Get started
ollama
You'll be prompted to run a model or connect Ollama to your existing agents or applications such as Claude Code, OpenClaw, OpenCode , Codex, Copilot, and more.
Coding
To launch a specific integration:
ollama launch claude
Supported integrations include Claude Code, Codex, Copilot CLI, Droid, and OpenCode.
AI assistant
Use OpenClaw to turn Ollama into a personal AI assistant across WhatsApp, Telegram, Slack, Discord, and more:
ollama launch openclaw
Chat with a model
Run and chat with Gemma 3:
ollama run gemma3
See ollama.com/library for the full list.
See the quickstart guide for more details.
REST API
Ollama has a REST API for running and managing models.
curl http://localhost:11434/api/chat -d '{
"model": "gemma3",
"messages": [{
"role": "user",
"content": "Why is the sky blue?"
}],
"stream": false
}'
See the API documentation for all endpoints.
Python
pip install ollama
from ollama import chat
response = chat(model='gemma3', messages=[
{
'role': 'user',
'content': 'Why is the sky blue?',
},
])
print(response.message.content)
JavaScript
npm i ollama
import ollama from "ollama";
const response = await ollama.chat({
model: "gemma3",
messages: [{ role: "user", content: "Why is the sky blue?" }],
});
console.log(response.message.content);
Supported backends
- llama.cpp project founded by Georgi Gerganov.
Documentation
Community Integrations
Want to add your project? Open a pull request.
Chat Interfaces
Web
- Open WebUI - Extensible, self-hosted AI interface
- Onyx - Connected AI workspace
- LibreChat - Enhanced ChatGPT clone with multi-provider support
- Lobe Chat - Modern chat framework with plugin ecosystem (docs)
- NextChat - Cross-platform ChatGPT UI (docs)
- Perplexica - AI-powered search engine, open-source Perplexity alternative
- big-AGI - AI suite for professionals
- Lollms WebUI - Multi-model web interface
- ChatOllama - Chatbot with knowledge bases
- Bionic GPT - On-premise AI platform
- Chatbot UI - ChatGPT-style web interface
- Hollama - Minimal web interface
- Chatbox - Desktop and web AI client
- chat - Chat web app for teams
- Ollama RAG Chatbot - Chat with multiple PDFs using RAG
- Tkinter-based client - Python desktop client
Desktop
- Dify.AI - LLM app development platform
- AnythingLLM - All-in-one AI app for Mac, Windows, and Linux
- Maid - Cross-platform mobile and desktop client
- Witsy - AI desktop app for Mac, Windows, and Linux
- Cherry Studio - Multi-provider desktop client
- Ollama App - Multi-platform client for desktop and mobile
- PyGPT - AI desktop assistant for Linux, Windows, and Mac
- Alpaca - GTK4 client for Linux and macOS
- SwiftChat - Cross-platform including iOS, Android, and Apple Vision Pro
- Enchanted - Native macOS and iOS client
- RWKV-Runner - Multi-model desktop runner
- Ollama Grid Search - Evaluate and compare models
- macai - macOS client for Ollama and ChatGPT
- AI Studio - Multi-provider desktop IDE
- Reins - Parameter tuning and reasoning model support
- ConfiChat - Privacy-focused with optional encryption
- LLocal.in - Electron desktop client
- MindMac - AI chat client for Mac
- Msty - Multi-model desktop client
- BoltAI for Mac - AI chat client for Mac
- IntelliBar - AI-powered assistant for macOS
- Kerlig AI - AI writing assistant for macOS
- Hillnote - Markdown-first AI workspace
- Perfect Memory AI - Productivity AI personalized by screen and meeting history
Mobile
- Ollama Android Chat - One-click Ollama on Android
SwiftChat, Enchanted, Maid, Ollama App, Reins, and ConfiChat listed above also support mobile platforms.
Code Editors & Development
- Cline - VS Code extension for multi-file/whole-repo coding
- Continue - Open-source AI code assistant for any IDE
- Void - Open source AI code editor, Cursor alternative
- Copilot for Obsidian - AI assistant for Obsidian
- twinny - Copilot and Copilot chat alternative
- gptel Emacs client - LLM client for Emacs
- Ollama Copilot - Use Ollama as GitHub Copilot
- Obsidian Local GPT - Local AI for Obsidian
- Ellama Emacs client - LLM tool for Emacs
- orbiton - Config-free text editor with Ollama tab completion
- AI ST Completion - Sublime Text 4 AI assistant
- VT Code - Rust-based terminal coding agent with Tree-sitter
- QodeAssist - AI coding assistant for Qt Creator
- AI Toolkit for VS Code - Microsoft-official VS Code extension
- Open Interpreter - Natural language interface for computers
Libraries & SDKs
- LiteLLM - Unified API for 100+ LLM providers
- Semantic Kernel - Microsoft AI orchestration SDK
- LangChain4j - Java LangChain (example)
- LangChainGo - Go LangChain (example)
- Spring AI - Spring framework AI support (docs)
- LangChain and LangChain.js with example
- Ollama for Ruby - Ruby LLM library
- any-llm - Unified LLM interface by Mozilla
- OllamaSharp for .NET - .NET SDK
- LangChainRust - Rust LangChain (example)
- Agents-Flex for Java - Java agent framework (example)
- Elixir LangChain - Elixir LangChain
- Ollama-rs for Rust - Rust SDK
- LangChain for .NET - .NET LangChain (example)
- chromem-go - Go vector database with Ollama embeddings (example)
- LangChainDart - Dart LangChain
- LlmTornado - Unified C# interface for multiple inference APIs
- Ollama4j for Java - Java SDK
- Ollama for Laravel - Laravel integration
- Ollama for Swift - Swift SDK
- LlamaIndex and LlamaIndexTS - Data framework for LLM apps
- Haystack - AI pipeline framework
- Firebase Genkit - Google AI framework
- Ollama-hpp for C++ - C++ SDK
- PromptingTools.jl - Julia LLM toolkit (example)
- Ollama for R - rollama - R SDK
- Portkey - AI gateway
- Testcontainers - Container-based testing
- LLPhant - PHP AI framework
Frameworks & Agents
- AutoGPT - Autonomous AI agent platform
- crewAI - Multi-agent orchestration framework
- Strands Agents - Model-driven agent building by AWS
- Cheshire Cat - AI assistant framework
- any-agent - Unified agent framework interface by Mozilla
- Stakpak - Open source DevOps agent
- Hexabot - Conversational AI builder
- Neuro SAN - Multi-agent orchestration (docs)
RAG & Knowledge Bases
- RAGFlow - RAG engine based on deep document understanding
- R2R - Open-source RAG engine
- MaxKB - Ready-to-use RAG chatbot
- Minima - On-premises or fully local RAG
- Chipper - AI interface with Haystack RAG
- ARGO - RAG and deep research on Mac/Windows/Linux
- Archyve - RAG-enabling document library
- Casibase - AI knowledge base with RAG and SSO
- BrainSoup - Native client with RAG and multi-agent automation
Bots & Messaging
- LangBot - Multi-platform messaging bots with agents and RAG
- AstrBot - Multi-platform chatbot with RAG and plugins
- Discord-Ollama Chat Bot - TypeScript Discord bot
- Ollama Telegram Bot - Telegram bot
- LLM Telegram Bot - Telegram bot for roleplay
Terminal & CLI
- aichat - All-in-one LLM CLI with Shell Assistant, RAG, and AI tools
- oterm - Terminal client for Ollama
- gollama - Go-based model manager for Ollama
- tlm - Local shell copilot
- tenere - TUI for LLMs
- ParLlama - TUI for Ollama
- llm-ollama - Plugin for Datasette's LLM CLI
- ShellOracle - Shell command suggestions
- LLM-X - Progressive web app for LLMs
- cmdh - Natural language to shell commands
- VT - Minimal multimodal AI chat app
Productivity & Apps
- AppFlowy - AI collaborative workspace, self-hostable Notion alternative
- Screenpipe - 24/7 screen and mic recording with AI-powered search
- Vibe - Transcribe and analyze meetings
- Page Assist - Chrome extension for AI-powered browsing
- NativeMind - Private, on-device browser AI assistant
- Ollama Fortress - Security proxy for Ollama
- 1Panel - Web-based Linux server management
- Writeopia - Text editor with Ollama integration
- QA-Pilot - GitHub code repository understanding
- Raycast extension - Ollama in Raycast
- Painting Droid - Painting app with AI integrations
- Serene Pub - AI roleplaying app
- Mayan EDMS - Document management with Ollama workflows
- TagSpaces - File management with AI tagging
Observability & Monitoring
- Opik - Debug, evaluate, and monitor LLM applications
- OpenLIT - OpenTelemetry-native monitoring for Ollama and GPUs
- Lunary - LLM observability with analytics and PII masking
- Langfuse - Open source LLM observability
- HoneyHive - AI observability and evaluation for agents
- MLflow Tracing - Open source LLM observability
Database & Embeddings
- pgai - PostgreSQL as a vector database (guide)
- MindsDB - Connect Ollama with 200+ data platforms
- chromem-go - Embeddable vector database for Go (example)
- Kangaroo - AI-powered SQL client
Infrastructure & Deployment
Cloud
- Google Cloud
- Fly.io
- Koyeb
- Harbor - Containerized LLM toolkit with Ollama as default backend