Commit Graph

4810 Commits

Author SHA1 Message Date
Inforithmics
f2842defcb Merge remote-tracking branch 'upstream/main' into VulkanV3Update 2025-10-06 11:39:24 +02:00
Inforithmics
2acedf1756 update patch 2025-10-06 10:01:28 +02:00
Inforithmics
e9828e6b11 Return pci Properties 2025-10-06 09:53:51 +02:00
Inforithmics
fd648506c1 return integrated in vulkan backend 2025-10-05 21:13:21 +02:00
Inforithmics
37206cdf32 remvoe debug code 2025-10-05 20:56:21 +02:00
Inforithmics
d02a08aa7c return Library Name 2025-10-05 20:55:28 +02:00
Inforithmics
66d1033610 fixed patch number 2025-10-05 20:41:05 +02:00
Inforithmics
3f38cdb590 Revert "rturn Vulkan for vulkan library"
This reverts commit 690461a12f.
2025-10-05 20:38:07 +02:00
Inforithmics
690461a12f rturn Vulkan for vulkan library 2025-10-05 20:29:38 +02:00
Inforithmics
218e57974f print out unknown library 2025-10-05 17:04:12 +02:00
Inforithmics
cafdb5c0d6 improve case 2025-10-05 16:46:55 +02:00
Inforithmics
d5a2462c8e handle igpu as gpu 2025-10-05 16:20:10 +02:00
Inforithmics
908b31814d fixed vulkan casing 2025-10-05 11:01:26 +02:00
Inforithmics
6bef63b0f9 fix format 2025-10-04 21:45:06 +02:00
Inforithmics
f8551bc631 merge fixes 2025-10-04 21:28:15 +02:00
Daniel Hiltgen
292767afb4 CI: fix win arm build (#12502)
Resolve subtle erroraction stickiness difference between x86 and arm builder setup
v0.12.4-rc6
2025-10-04 11:46:45 -07:00
Inforithmics
8ad169403b update build windows script 2025-10-04 19:25:34 +02:00
Inforithmics
4803e57c9b Merge remote-tracking branch 'upstream/main' into VulkanV3Update 2025-10-04 19:14:12 +02:00
Inforithmics
93d7126ce5 sync llama.cpp vulkan code 2025-10-04 19:02:57 +02:00
Inforithmics
163f62fcb6 fix vulkan gpu id patch 2025-10-04 18:56:38 +02:00
Daniel Hiltgen
ae5e0f0889 CI: replace clang compiler for windows (#12495) v0.12.4-rc5 2025-10-04 09:18:42 -07:00
Inforithmics
96e562f982 fixed build 2025-10-04 16:35:04 +02:00
Inforithmics
9ac9f3a952 fixed formatting 2025-10-04 16:32:39 +02:00
Inforithmics
b2aba4ea83 fixed build 2025-10-04 16:26:03 +02:00
Inforithmics
06528d66aa fixing build 2025-10-04 16:22:55 +02:00
Inforithmics
75f65bcdbf merge fixes 2025-10-04 16:11:34 +02:00
Inforithmics
1e46db8748 fixed build 2025-10-04 15:44:23 +02:00
Inforithmics
c4d8c75e54 merge fixes 2025-10-04 15:27:52 +02:00
Inforithmics
294b179688 merge fixes 2025-10-04 15:20:33 +02:00
Inforithmics
f567cc59d4 fix build 2025-10-04 15:08:18 +02:00
Inforithmics
e6c28916e1 Merge branch 'vulkanV3' into VulkanV3Update 2025-10-04 14:59:30 +02:00
Inforithmics
ac6ba7d44b Merge remote-tracking branch 'upstream/main' into VulkanV3Update 2025-10-04 14:53:59 +02:00
Jesse Gross
19e6796eac llm: Support KV cache quantization with gpt-oss
With the new version of GGML in #12245, KV cache quantization
no longer causes a fallback to CPU.
2025-10-03 16:31:58 -07:00
Grace
33801c1597 Fixed Deepseek2 adding nil tensor error 2025-10-03 14:20:06 -07:00
Daniel Hiltgen
e4340667e3 Workaround broken NVIDIA iGPU free VRAM data (#12490)
The CUDA APIs for reporting free VRAM are useless on NVIDIA iGPU
systems as they only return the kernels actual free memory and ignore
buff/cache allocations which on a typical system will quickly fill up
most of the free system memory.  As a result, we incorrectly think
there's very little available for GPU allocations which is wrong.
2025-10-03 12:17:21 -07:00
Patrick Devine
2fa1e92a99 test: add template error test (#12489) 2025-10-03 12:05:34 -07:00
Daniel Hiltgen
07e36761c3 ci: place rocm windows in correct runner dir (#12487) v0.12.4-rc4 2025-10-03 07:28:40 -07:00
Daniel Hiltgen
c29fb007c0 CI: temporarily disable clang install (#12486)
This will likely yield builds that have problems with unicode characters
but at least we can start testing the release while we try to find an
alternate clang compiler for windows, or mingw ships a fixed version.
v0.12.4-rc3
2025-10-02 20:31:18 -07:00
Daniel Hiltgen
730ed6e9e1 ci: fix windows build (#12485) v0.12.4-rc2 2025-10-02 19:16:01 -07:00
Daniel Hiltgen
dc06601677 ci: fix windows build (#12484) v0.12.4-rc1 2025-10-02 18:59:26 -07:00
Patrick Devine
1ed2881ef0 templates: fix crash in improperly defined templates (#12483) 2025-10-02 17:25:55 -07:00
Jesse Gross
0bda72892c llm: Enable flash attention by default for qwen3 and qwen3moe v0.12.4-rc0 2025-10-02 17:04:10 -07:00
Daniel Hiltgen
55ca827267 AMD: block running on unsupported gfx900/gfx906 (#12481) 2025-10-02 16:53:05 -07:00
Daniel Hiltgen
c68f367ef6 Update GGML to b6646 (#12245)
Notable EOLs with this change:
- MacOS v12 and v13 are no longer supported (v14+ required)
- AMD gfx900 and gfx906 are no longer supported
2025-10-02 14:47:10 -07:00
Jesse Gross
fdb109469f llm: Allow overriding flash attention setting
As we automatically enable flash attention for more models, there
are likely some cases where we get it wrong. This allows setting
OLLAMA_FLASH_ATTENTION=0 to disable it, even for models that usually
have flash attention.
2025-10-02 12:07:20 -07:00
Daniel Hiltgen
05a43e078a fix panic on bootstrapDevices (#12475)
Wrong index variable was used.
2025-10-01 17:39:29 -07:00
Daniel Hiltgen
bc8909fb38 Use runners for GPU discovery (#12090)
This revamps how we discover GPUs in the system by leveraging the Ollama
runner.  This should eliminate inconsistency between our GPU discovery and the
runners capabilities at runtime, particularly for cases where we try to filter
out unsupported GPUs.  Now the runner does that implicitly based on the actual
device list.  In some cases free VRAM reporting can be unreliable which can
leaad to scheduling mistakes, so this also includes a patch to leverage more
reliable VRAM reporting libraries if available.

Automatic workarounds have been removed as only one GPU leveraged this, which
is now documented. This GPU will soon fall off the support matrix with the next
ROCm bump.

Additional cleanup of the scheduler and discovery packages can be done in the
future once we have switched on the new memory management code, and removed
support for the llama runner.
2025-10-01 15:12:32 -07:00
Devon Rifkin
6b50f2b9cd Merge pull request #12461 from ollama/drifkin/qwen3-coder-tweaks
qwen3-coder: fix tool definition type rendering
2025-09-30 19:47:44 -07:00
Michael Yang
35ac4eb12c fix keep alive
this reference to keep alive was missed in #12041 so chat has a
diffferent behaviour than generate
2025-09-30 17:22:28 -07:00
Jesse Gross
3d0b1734c0 ggml: Preallocate CUDA pool memory
The GGML CUDA backend allocates additional memory for intermediate
results during calculation. This memory isn't currently allocated
during worst case graph reservation and therefore not included in
scheduling. This means that as these buffers potentially grow
with context length, we could crash.

This extends the memory allocation system down layer from the GGML
graph to the CUDA layer, preallocating the worst case memory there
as well.

Fixes #11753
2025-09-30 15:04:43 -07:00