Commit Graph

597 Commits

Author SHA1 Message Date
Josh Yan
64405525b4 clean up 2024-07-16 16:40:38 -07:00
Josh Yan
dea2204b82 rmv comments 2024-07-16 16:37:50 -07:00
Josh Yan
6ee22d5080 clean 2024-07-16 16:35:15 -07:00
Josh Yan
703ecccc6b clean 2024-07-16 14:17:44 -07:00
Josh Yan
873f334783 IT WORKS 2024-07-16 14:12:07 -07:00
Josh Yan
fa49bfc0bd FIXED TESTS 2024-07-16 12:14:10 -07:00
Josh Yan
fc1b3ee9bf test 2024-07-16 11:21:13 -07:00
Josh Yan
25be20949c test 2024-07-15 15:08:24 -07:00
Josh Yan
903e9df46f test 2024-07-15 11:46:49 -07:00
Josh Yan
40c0f9612e unneccesary 2024-07-14 18:41:16 -07:00
Josh Yan
15a0215203 running 2024-07-12 16:49:57 -07:00
Josh Yan
faa3c937cf writeto 2024-07-12 15:37:27 -07:00
Josh Yan
cf57246aba write 2024-07-12 12:59:51 -07:00
Josh Yan
6fafe4f753 gguf 2024-07-12 12:58:00 -07:00
Josh Yan
d7c8d4f3f4 ggufwritekv 2024-07-12 12:25:13 -07:00
Josh Yan
3d0fd31f0e TensorWriter 2024-07-12 12:18:46 -07:00
Josh Yan
e75fb73839 types 2024-07-12 09:42:10 -07:00
Josh Yan
2fdebffc8d sawp 2024-07-11 18:18:26 -07:00
Josh Yan
29ecfe493b write 2024-07-11 17:56:51 -07:00
Josh
10e768826c fix: quant err message (#5616) 2024-07-11 17:24:29 -07:00
Jeffrey Morgan
c4cf8ad559 llm: avoid loading model if system memory is too small (#5637)
* llm: avoid loading model if system memory is too small

* update log

* Instrument swap free space

On linux and windows, expose how much swap space is available
so we can take that into consideration when scheduling models

* use `systemSwapFreeMemory` in check

---------

Co-authored-by: Daniel Hiltgen <daniel@ollama.com>
2024-07-11 16:42:57 -07:00
Jeffrey Morgan
791650ddef sched: only error when over-allocating system memory (#5626) 2024-07-11 00:53:12 -07:00
Jeffrey Morgan
efbf41ed81 llm: dont link cuda with compat libs (#5621) 2024-07-10 20:01:52 -07:00
Michael Yang
37a570f962 Merge pull request #5612 from ollama/mxyng/mem
chatglm graph
2024-07-10 14:18:33 -07:00
Michael Yang
5a739ff4cb chatglm graph 2024-07-10 13:43:47 -07:00
Jeffrey Morgan
4e262eb2a8 remove GGML_CUDA_FORCE_MMQ=on from build (#5588) 2024-07-10 13:17:13 -07:00
Daniel Hiltgen
b50c818623 Merge pull request #5607 from dhiltgen/win_rocm_v6
Bump ROCm on windows to 6.1.2
2024-07-10 12:47:10 -07:00
Daniel Hiltgen
1f50356e8e Bump ROCm on windows to 6.1.2
This also adjusts our algorithm to favor our bundled ROCm.
I've confirmed VRAM reporting still doesn't work properly so we
can't yet enable concurrency by default.
2024-07-10 11:01:22 -07:00
Daniel Hiltgen
22c81f62ec Remove duplicate merge glitch 2024-07-10 09:01:33 -07:00
Daniel Hiltgen
2d1e3c3229 Merge pull request #5503 from dhiltgen/dual_rocm
Workaround broken ROCm p2p copy
2024-07-09 15:44:16 -07:00
Daniel Hiltgen
b51e3b63ac Statically link c++ and thread lib
This makes sure we statically link the c++ and thread library on windows
to avoid unnecessary runtime dependencies on non-standard DLLs
2024-07-09 11:34:30 -07:00
Michael Yang
9bbddc37a7 Merge pull request #5126 from ollama/mxyng/messages
update message processing
2024-07-09 09:20:44 -07:00
Daniel Hiltgen
0bacb30007 Workaround broken ROCm p2p copy
Enable the build flag for llama.cpp to use CPU copy for multi-GPU scenarios.
2024-07-08 09:40:52 -07:00
Jeffrey Morgan
53da2c6965 llm: remove ambiguous comment when putting upper limit on predictions to avoid infinite generation (#5535) 2024-07-07 14:32:05 -04:00
Jeffrey Morgan
d8def1ff94 llm: allow gemma 2 to context shift (#5534) 2024-07-07 13:41:51 -04:00
Jeffrey Morgan
571dc61955 Update llama.cpp submodule to a8db2a9c (#5530) 2024-07-07 13:03:09 -04:00
Jeffrey Morgan
0e09c380fc llm: print caching notices in debug only (#5533) 2024-07-07 12:38:04 -04:00
Jeffrey Morgan
4607c70641 llm: add -DBUILD_SHARED_LIBS=off to common cpu cmake flags (#5520) 2024-07-06 18:58:16 -04:00
jmorganca
a08f20d910 release: remove unwanted mingw dll.a files 2024-07-06 15:21:15 -04:00
jmorganca
6cea036027 Revert "llm: only statically link libstdc++"
This reverts commit 5796bfc401.
2024-07-06 15:10:48 -04:00
jmorganca
5796bfc401 llm: only statically link libstdc++ 2024-07-06 14:06:20 -04:00
jmorganca
f1a379aa56 llm: statically link pthread and stdc++ dependencies in windows build 2024-07-06 12:54:02 -04:00
jmorganca
9ae146993e llm: add GGML_STATIC flag to windows static lib 2024-07-06 03:27:05 -04:00
Jeffrey Morgan
e0348d3fe8 llm: add COMMON_DARWIN_DEFS to arm static build (#5513) 2024-07-05 22:42:42 -04:00
Jeffrey Morgan
2cc854f8cb llm: fix missing dylibs by restoring old build behavior on Linux and macOS (#5511)
* Revert "fix cmake build (#5505)"

This reverts commit 4fd5f3526a.

* llm: fix missing dylibs by restoring old build behavior

* crlf -> lf
2024-07-05 21:48:31 -04:00
Jeffrey Morgan
5304b765b2 llm: put back old include dir (#5507)
* llm: put back old include dir

* llm: update link paths for old submodule commits
2024-07-05 19:34:21 -04:00
Jeffrey Morgan
4fd5f3526a fix cmake build (#5505) 2024-07-05 19:07:01 -04:00
Michael Yang
ac7a842e55 fix model reloading
ensure runtime model changes (template, system prompt, messages,
options) are captured on model updates without needing to reload the
server
2024-07-05 13:17:25 -07:00
Jeffrey Morgan
78fb33dd07 fix typo in cgo directives in llm.go (#5501) 2024-07-05 15:18:36 -04:00
Jeffrey Morgan
8f8e736b13 update llama.cpp submodule to d7fd29f (#5475) 2024-07-05 13:25:58 -04:00