mirror of https://github.com/ollama/ollama.git synced 2026-04-24 01:35:49 +02:00

Go to file

jmorganca db0c745308 llama/compat: add qwen35moe vision (clip) support

Extends the compat layer with the vision side for Ollama's monolithic
qwen3.5 blobs. All changes in llama/compat/ — no new upstream patch edits.

New generic infra (reused by gemma3's existing promotion):
  - LoadOp registry (g_loadops). Any dest tensor whose name is registered
    gets its bytes produced by a closure instead of being read straight
    from disk. maybe_load_tensor consults it.
  - promote_tensor_to_f32(meta, ctx, name) now captures the source offset
    at registration time and becomes a LoadOp. Gemma3 already migrated.
  - register_concat_load(meta, dest, {srcs...}) captures the file offsets
    of N source tensors and registers a LoadOp that concatenates them.
    Assumes sources concatenate along their slowest ggml axis — which in
    C order means the dest bytes are src[0] || src[1] || ... .
  - set_tensor_shape / set_tensor_type helpers for in-place edits.

qwen35moe clip handler (handle_qwen35moe_clip):
  - Detection reuses detect_ollama_qwen35moe; additionally requires
    embedded v.* tensors so we don't fire for text-only files.
  - KV synth: clip.vision.* from qwen35moe.vision.* + sensible defaults
    (feed_forward_length=4304, image_size=768, layer_norm_epsilon=1e-6,
    is_deepstack_layers=false[27], image_mean/std=[0.5,0.5,0.5]).
  - Arch rewrite: general.architecture=clip, projector_type=qwen3vl_merger.
  - QKV merge per block (27x): captures q/k/v file offsets, registers a
    concat LoadOp, renames attn_q -> attn_qkv and widens its shape from
    [hidden, hidden] to [hidden, 3*hidden].
  - patch_embed split: source [16,16,2,3456] F16 -> two dests
    [16,16,3,1152] F32, permuting (c_out*3+c_in) packed_c back into
    separate c_in/c_out dims. Matches upstream convert_hf's
    Qwen3VLVisionModel.modify_tensors split.
  - Tensor renames (substring-matched): pos_embed -> position_embd,
    merger.norm -> post_ln, merger.linear_fc1/2 -> mm.0/mm.2,
    mlp.linear_fc1/2 -> ffn_up/ffn_down, norm1/2 -> ln1/ln2.
  - F16 -> F32 promote for v.position_embd.weight.

Ctx-pool trick for the sibling tensor:
  clip.cpp sizes its ggml_context for exactly the gguf's tensor count
  (+1). ggml_new_tensor to add v.patch_embd.weight.1 overflows. Since
  v.blk.0.attn_k.weight is orphaned after the QKV merge (clip only
  requests the merged attn_qkv), steal that slot: rename it to
  v.patch_embd.weight.1 and reshape to [16,16,3,1152] F32. Its original
  file offset is ignored; the LoadOp we register overrides the read.

Go side: adds qwen35moe to the auto-mmproj arch allowlist. ollama now
passes the monolithic blob as both --model and --mmproj for qwen3.5.

Verified end-to-end: ollama run qwen3.5:35b-a3b-q4_K_M with an image
correctly describes the image ("screenshot of a chat interface...
'open the browser, open never gonna give you up on youtube'..."). Text
inference still works on the same blob.

2026-04-20 09:29:34 -07:00

.github

runner: Remove CGO engines, use llama-server exclusively for GGML models

2026-04-20 08:44:02 -07:00

anthropic

anthropic: fix empty inputs in content blocks (#15105 )

2026-03-27 15:41:27 -07:00

api

Add support for gemma4 (#15214 )

2026-04-02 11:33:33 -07:00

app

runner: Remove CGO engines, use llama-server exclusively for GGML models

2026-04-20 08:44:02 -07:00

auth

auth: fix problems with the ollama keypairs (#12373 )

2025-09-22 23:20:20 -07:00

cmd

runner: Remove CGO engines, use llama-server exclusively for GGML models

2026-04-20 08:44:02 -07:00

convert

runner: Remove CGO engines, use llama-server exclusively for GGML models

2026-04-20 08:44:02 -07:00

discover

runner: Remove CGO engines, use llama-server exclusively for GGML models

2026-04-20 08:44:02 -07:00

docs

docs: update hermes (#15655 )

2026-04-17 14:20:59 -07:00

envconfig

runner: Remove CGO engines, use llama-server exclusively for GGML models

2026-04-20 08:44:02 -07:00

format

chore(all): replace instances of interface with any (#10067 )

2025-04-02 09:44:27 -07:00

gemma4: enable flash attention (#15378 )

2026-04-07 08:12:36 -07:00

harmony

Parser for Cogito v2 (#13145 )

2025-11-19 17:21:07 -08:00

integration

runner: Remove CGO engines, use llama-server exclusively for GGML models

2026-04-20 08:44:02 -07:00

internal

Reapply "don't require pulling stubs for cloud models" again (#14608 )

2026-03-06 14:27:47 -08:00

kvcache

model: support for qwen3.5 architecture (#14378 )

2026-02-24 20:08:05 -08:00

llama

llama/compat: add qwen35moe vision (clip) support

2026-04-20 09:29:34 -07:00

llm

llama/compat: add qwen35moe vision (clip) support

2026-04-20 09:29:34 -07:00

logutil

logutil: fix source field (#12279 )

2025-09-16 16:18:07 -07:00

manifest

create: avoid gc race with create (#15628 )

2026-04-16 13:29:16 -07:00

middleware

Add support for gemma4 (#15214 )

2026-04-02 11:33:33 -07:00

runner: Remove CGO engines, use llama-server exclusively for GGML models

2026-04-20 08:44:02 -07:00

model

runner: Remove CGO engines, use llama-server exclusively for GGML models

2026-04-20 08:44:02 -07:00

openai

fix: improve error message for unknown input item type in responses API (#15424 )

2026-04-08 17:41:12 -07:00

parser

MLX: add header vendoring and remove go build tag (#14642 )

2026-03-09 17:24:45 -07:00

progress

Add z-image image generation prototype (#13659 )

2026-01-09 21:09:46 -08:00

readline

Add support for gemma4 (#15214 )

2026-04-02 11:33:33 -07:00

runner

runner: Remove CGO engines, use llama-server exclusively for GGML models

2026-04-20 08:44:02 -07:00

scripts

runner: Remove CGO engines, use llama-server exclusively for GGML models

2026-04-20 08:44:02 -07:00

server

llm,server: route Ollama-format gemma3 blobs through llama/compat

2026-04-20 09:29:34 -07:00

template

runner: Remove CGO engines, use llama-server exclusively for GGML models

2026-04-20 08:44:02 -07:00

thinking

thinking: fix double emit when no opening tag

2025-08-21 21:03:12 -07:00

tokenizer

tokenizer: add byte fallback for SentencePiece BPE encoding (#15232 )

2026-04-02 13:04:45 -07:00

tools

runner: Remove CGO engines, use llama-server exclusively for GGML models

2026-04-20 08:44:02 -07:00

types

Add support for gemma4 (#15214 )

2026-04-02 11:33:33 -07:00

version

add version

2023-08-22 09:40:58 -07:00

mlx: apply repeat penalties in sampler (#15631 )

2026-04-18 07:49:38 -07:00

.dockerignore

next build (#8539 )

2025-01-29 15:03:38 -08:00

.gitattributes

.gitattributes: add app/webview to linguist-vendored (#13274 )

2025-11-29 23:46:10 -05:00

.gitignore

create: Clean up experimental paths, fix create from existing safetensor model (#14679 )

2026-04-07 08:12:57 -07:00

.golangci.yaml

ci: restore previous linter rules (#13322 )

2025-12-03 18:55:02 -08:00

CMakeLists.txt

runner: Remove CGO engines, use llama-server exclusively for GGML models

2026-04-20 08:44:02 -07:00

CMakePresets.json

runner: Remove CGO engines, use llama-server exclusively for GGML models

2026-04-20 08:44:02 -07:00

CONTRIBUTING.md

docs: fix typos in repository documentation (#10683 )

2025-11-15 20:22:29 -08:00

Dockerfile

runner: Remove CGO engines, use llama-server exclusively for GGML models

2026-04-20 08:44:02 -07:00

go.mod

runner: Remove CGO engines, use llama-server exclusively for GGML models

2026-04-20 08:44:02 -07:00

go.sum

runner: Remove CGO engines, use llama-server exclusively for GGML models

2026-04-20 08:44:02 -07:00

LICENSE

proto -> ollama

2023-06-26 15:57:13 -04:00

LLAMA_CPP_VERSION

runner: Remove CGO engines, use llama-server exclusively for GGML models

2026-04-20 08:44:02 -07:00

main.go

lint

2024-08-01 17:06:06 -07:00

MLX_C_VERSION

mlx: update as of 3/23 (#14789 )

2026-03-23 11:28:44 -07:00

MLX_VERSION

mlx: update as of 3/23 (#14789 )

2026-03-23 11:28:44 -07:00

README.md

cmd/launch: add Copilot CLI integration (#15583 )

2026-04-15 17:22:53 -07:00

SECURITY.md

docs: fix typos in repository documentation (#10683 )

2025-11-15 20:22:29 -08:00

README.md

Ollama

Start building with open models.

Download

macOS

curl -fsSL https://ollama.com/install.sh | sh

or download manually

Windows

irm https://ollama.com/install.ps1 | iex

or download manually

Linux

curl -fsSL https://ollama.com/install.sh | sh

Manual install instructions

Docker

The official Ollama Docker image ollama/ollama is available on Docker Hub.

Libraries

Community

Get started

ollama

You'll be prompted to run a model or connect Ollama to your existing agents or applications such as Claude Code, OpenClaw, OpenCode , Codex, Copilot, and more.

Coding

To launch a specific integration:

ollama launch claude

Supported integrations include Claude Code, Codex, Copilot CLI, Droid, and OpenCode.

AI assistant

Use OpenClaw to turn Ollama into a personal AI assistant across WhatsApp, Telegram, Slack, Discord, and more:

ollama launch openclaw

Chat with a model

Run and chat with Gemma 3:

ollama run gemma3

See ollama.com/library for the full list.

See the quickstart guide for more details.

REST API

Ollama has a REST API for running and managing models.

curl http://localhost:11434/api/chat -d '{
  "model": "gemma3",
  "messages": [{
    "role": "user",
    "content": "Why is the sky blue?"
  }],
  "stream": false
}'

See the API documentation for all endpoints.

Python

pip install ollama

from ollama import chat

response = chat(model='gemma3', messages=[
  {
    'role': 'user',
    'content': 'Why is the sky blue?',
  },
])
print(response.message.content)

JavaScript

npm i ollama

import ollama from "ollama";

const response = await ollama.chat({
  model: "gemma3",
  messages: [{ role: "user", content: "Why is the sky blue?" }],
});
console.log(response.message.content);

Supported backends

llama.cpp project founded by Georgi Gerganov.

Documentation

Community Integrations

Want to add your project? Open a pull request.

Chat Interfaces

Web

Open WebUI - Extensible, self-hosted AI interface
Onyx - Connected AI workspace
LibreChat - Enhanced ChatGPT clone with multi-provider support
Lobe Chat - Modern chat framework with plugin ecosystem (docs)
NextChat - Cross-platform ChatGPT UI (docs)
Perplexica - AI-powered search engine, open-source Perplexity alternative
big-AGI - AI suite for professionals
Lollms WebUI - Multi-model web interface
ChatOllama - Chatbot with knowledge bases
Bionic GPT - On-premise AI platform
Chatbot UI - ChatGPT-style web interface
Hollama - Minimal web interface
Chatbox - Desktop and web AI client
chat - Chat web app for teams
Ollama RAG Chatbot - Chat with multiple PDFs using RAG
Tkinter-based client - Python desktop client

Desktop

Dify.AI - LLM app development platform
AnythingLLM - All-in-one AI app for Mac, Windows, and Linux
Maid - Cross-platform mobile and desktop client
Witsy - AI desktop app for Mac, Windows, and Linux
Cherry Studio - Multi-provider desktop client
Ollama App - Multi-platform client for desktop and mobile
PyGPT - AI desktop assistant for Linux, Windows, and Mac
Alpaca - GTK4 client for Linux and macOS
SwiftChat - Cross-platform including iOS, Android, and Apple Vision Pro
Enchanted - Native macOS and iOS client
RWKV-Runner - Multi-model desktop runner
Ollama Grid Search - Evaluate and compare models
macai - macOS client for Ollama and ChatGPT
AI Studio - Multi-provider desktop IDE
Reins - Parameter tuning and reasoning model support
ConfiChat - Privacy-focused with optional encryption
LLocal.in - Electron desktop client
MindMac - AI chat client for Mac
Msty - Multi-model desktop client
BoltAI for Mac - AI chat client for Mac
IntelliBar - AI-powered assistant for macOS
Kerlig AI - AI writing assistant for macOS
Hillnote - Markdown-first AI workspace
Perfect Memory AI - Productivity AI personalized by screen and meeting history

Mobile

Ollama Android Chat - One-click Ollama on Android

SwiftChat, Enchanted, Maid, Ollama App, Reins, and ConfiChat listed above also support mobile platforms.

Code Editors & Development

Cline - VS Code extension for multi-file/whole-repo coding
Continue - Open-source AI code assistant for any IDE
Void - Open source AI code editor, Cursor alternative
Copilot for Obsidian - AI assistant for Obsidian
twinny - Copilot and Copilot chat alternative
gptel Emacs client - LLM client for Emacs
Ollama Copilot - Use Ollama as GitHub Copilot
Obsidian Local GPT - Local AI for Obsidian
Ellama Emacs client - LLM tool for Emacs
orbiton - Config-free text editor with Ollama tab completion
AI ST Completion - Sublime Text 4 AI assistant
VT Code - Rust-based terminal coding agent with Tree-sitter
QodeAssist - AI coding assistant for Qt Creator
AI Toolkit for VS Code - Microsoft-official VS Code extension
Open Interpreter - Natural language interface for computers

Libraries & SDKs

LiteLLM - Unified API for 100+ LLM providers
Semantic Kernel - Microsoft AI orchestration SDK
LangChain4j - Java LangChain (example)
LangChainGo - Go LangChain (example)
Spring AI - Spring framework AI support (docs)
LangChain and LangChain.js with example
Ollama for Ruby - Ruby LLM library
any-llm - Unified LLM interface by Mozilla
OllamaSharp for .NET - .NET SDK
LangChainRust - Rust LangChain (example)
Agents-Flex for Java - Java agent framework (example)
Elixir LangChain - Elixir LangChain
Ollama-rs for Rust - Rust SDK
LangChain for .NET - .NET LangChain (example)
chromem-go - Go vector database with Ollama embeddings (example)
LangChainDart - Dart LangChain
LlmTornado - Unified C# interface for multiple inference APIs
Ollama4j for Java - Java SDK
Ollama for Laravel - Laravel integration
Ollama for Swift - Swift SDK
LlamaIndex and LlamaIndexTS - Data framework for LLM apps
Haystack - AI pipeline framework
Firebase Genkit - Google AI framework
Ollama-hpp for C++ - C++ SDK
PromptingTools.jl - Julia LLM toolkit (example)
Ollama for R - rollama - R SDK
Portkey - AI gateway
Testcontainers - Container-based testing
LLPhant - PHP AI framework

Frameworks & Agents

AutoGPT - Autonomous AI agent platform
crewAI - Multi-agent orchestration framework
Strands Agents - Model-driven agent building by AWS
Cheshire Cat - AI assistant framework
any-agent - Unified agent framework interface by Mozilla
Stakpak - Open source DevOps agent
Hexabot - Conversational AI builder
Neuro SAN - Multi-agent orchestration (docs)

RAG & Knowledge Bases

RAGFlow - RAG engine based on deep document understanding
R2R - Open-source RAG engine
MaxKB - Ready-to-use RAG chatbot
Minima - On-premises or fully local RAG
Chipper - AI interface with Haystack RAG
ARGO - RAG and deep research on Mac/Windows/Linux
Archyve - RAG-enabling document library
Casibase - AI knowledge base with RAG and SSO
BrainSoup - Native client with RAG and multi-agent automation

Bots & Messaging

LangBot - Multi-platform messaging bots with agents and RAG
AstrBot - Multi-platform chatbot with RAG and plugins
Discord-Ollama Chat Bot - TypeScript Discord bot
Ollama Telegram Bot - Telegram bot
LLM Telegram Bot - Telegram bot for roleplay

Terminal & CLI

aichat - All-in-one LLM CLI with Shell Assistant, RAG, and AI tools
oterm - Terminal client for Ollama
gollama - Go-based model manager for Ollama
tlm - Local shell copilot
tenere - TUI for LLMs
ParLlama - TUI for Ollama
llm-ollama - Plugin for Datasette's LLM CLI
ShellOracle - Shell command suggestions
LLM-X - Progressive web app for LLMs
cmdh - Natural language to shell commands
VT - Minimal multimodal AI chat app

Productivity & Apps

AppFlowy - AI collaborative workspace, self-hostable Notion alternative
Screenpipe - 24/7 screen and mic recording with AI-powered search
Vibe - Transcribe and analyze meetings
Page Assist - Chrome extension for AI-powered browsing
NativeMind - Private, on-device browser AI assistant
Ollama Fortress - Security proxy for Ollama
1Panel - Web-based Linux server management
Writeopia - Text editor with Ollama integration
QA-Pilot - GitHub code repository understanding
Raycast extension - Ollama in Raycast
Painting Droid - Painting app with AI integrations
Serene Pub - AI roleplaying app
Mayan EDMS - Document management with Ollama workflows
TagSpaces - File management with AI tagging

Observability & Monitoring

Opik - Debug, evaluate, and monitor LLM applications
OpenLIT - OpenTelemetry-native monitoring for Ollama and GPUs
Lunary - LLM observability with analytics and PII masking
Langfuse - Open source LLM observability
HoneyHive - AI observability and evaluation for agents
MLflow Tracing - Open source LLM observability

Database & Embeddings

pgai - PostgreSQL as a vector database (guide)
MindsDB - Connect Ollama with 200+ data platforms
chromem-go - Embeddable vector database for Go (example)
Kangaroo - AI-powered SQL client

Infrastructure & Deployment

Cloud

Google Cloud
Fly.io
Koyeb
Harbor - Containerized LLM toolkit with Ollama as default backend

Package Managers

Languages

Go 82.4%

C 9.4%

TypeScript 4.4%

C++ 1.5%

Objective-C 0.6%

Other 1.6%

README.md Unescape Escape

Ollama

Download

macOS

Windows

Linux

Docker

Libraries

Community

Get started

Coding

AI assistant

Chat with a model

REST API

Python

JavaScript

Supported backends

Documentation

Community Integrations

Chat Interfaces

Web

Desktop

Mobile

Code Editors & Development

Libraries & SDKs

Frameworks & Agents

RAG & Knowledge Bases

Bots & Messaging

Terminal & CLI

Productivity & Apps

Observability & Monitoring

Database & Embeddings

Infrastructure & Deployment

Cloud

Package Managers

README.md