ollama

starred/ollama

Fork 0

mirror of https://github.com/ollama/ollama.git synced 2026-04-25 02:06:11 +02:00

Commit Graph

Author	SHA1	Message	Date
jmorganca	25223160d8	llama/compat: add in-memory shim so llama-server can load Ollama-format GGUFs Older Ollama builds ship GGUFs that diverge slightly from upstream llama.cpp in arch names, KV keys, tensor names, and (for vision models) file layout (text+vision in one monolithic file). This adds a self-contained compat layer that translates those files in memory at load time, so ~/.ollama/models/blobs/* can be served by upstream llama-server with no re-conversion and no re-download. Structure: llama/compat/ llama-ollama-compat.{h,cpp} — the shim (Ollama-owned, ~500 LOC) upstream-edits.patch — ~48 lines of call-site hooks in 6 upstream files compat.cmake — include()-able CMake fragment README.md — what/why/how-to-regen Integration: llama/server/CMakeLists.txt includes compat.cmake and passes OLLAMA_LLAMA_CPP_COMPAT_PATCH_COMMAND to FetchContent_Declare via PATCH_COMMAND. When OLLAMA_LLAMA_CPP_SOURCE is set (dev mode), the patch is skipped so the developer's tree stays untouched. Currently handles gemma3 (text + vision). Pattern is data-driven — adding other archs is a new handle_<arch>() + one dispatch line. See README for the per-arch checklist. Verified end-to-end: `llama-server --model BLOB --mmproj BLOB` with an Ollama gemma3:latest blob answers both text prompts ("Paris") and vision prompts (correct image descriptions).	2026-04-20 09:29:34 -07:00

Author

SHA1

Message

Date

jmorganca

25223160d8

llama/compat: add in-memory shim so llama-server can load Ollama-format GGUFs

Older Ollama builds ship GGUFs that diverge slightly from upstream llama.cpp
in arch names, KV keys, tensor names, and (for vision models) file layout
(text+vision in one monolithic file). This adds a self-contained compat
layer that translates those files in memory at load time, so
~/.ollama/models/blobs/* can be served by upstream llama-server with no
re-conversion and no re-download.

Structure:
  llama/compat/
    llama-ollama-compat.{h,cpp}   — the shim (Ollama-owned, ~500 LOC)
    upstream-edits.patch          — ~48 lines of call-site hooks in 6 upstream files
    compat.cmake                  — include()-able CMake fragment
    README.md                     — what/why/how-to-regen

Integration: llama/server/CMakeLists.txt includes compat.cmake and passes
OLLAMA_LLAMA_CPP_COMPAT_PATCH_COMMAND to FetchContent_Declare via
PATCH_COMMAND. When OLLAMA_LLAMA_CPP_SOURCE is set (dev mode), the patch is
skipped so the developer's tree stays untouched.

Currently handles gemma3 (text + vision). Pattern is data-driven — adding
other archs is a new handle_<arch>() + one dispatch line. See README for
the per-arch checklist.

Verified end-to-end: `llama-server --model BLOB --mmproj BLOB` with an
Ollama gemma3:latest blob answers both text prompts ("Paris") and vision
prompts (correct image descriptions).

2026-04-20 09:29:34 -07:00

1 Commits