Josh Yan
fa49bfc0bd
FIXED TESTS
2024-07-16 12:14:10 -07:00
Josh Yan
fc1b3ee9bf
test
2024-07-16 11:21:13 -07:00
Josh Yan
25be20949c
test
2024-07-15 15:08:24 -07:00
Josh Yan
903e9df46f
test
2024-07-15 11:46:49 -07:00
Josh Yan
40c0f9612e
unneccesary
2024-07-14 18:41:16 -07:00
Josh Yan
15a0215203
running
2024-07-12 16:49:57 -07:00
Josh Yan
faa3c937cf
writeto
2024-07-12 15:37:27 -07:00
Josh Yan
cf57246aba
write
2024-07-12 12:59:51 -07:00
Josh Yan
6fafe4f753
gguf
2024-07-12 12:58:00 -07:00
Josh Yan
d7c8d4f3f4
ggufwritekv
2024-07-12 12:25:13 -07:00
Josh Yan
3d0fd31f0e
TensorWriter
2024-07-12 12:18:46 -07:00
Josh Yan
554f3bdc0e
interface
2024-07-12 10:02:58 -07:00
Josh Yan
e75fb73839
types
2024-07-12 09:42:10 -07:00
Josh Yan
2fdebffc8d
sawp
2024-07-11 18:18:26 -07:00
Josh Yan
29ecfe493b
write
2024-07-11 17:56:51 -07:00
Michael Yang
47353f5ee4
Merge pull request #5639 from ollama/mxyng/unaggregated-system
2024-07-11 17:48:50 -07:00
Josh
10e768826c
fix: quant err message ( #5616 )
2024-07-11 17:24:29 -07:00
Michael Yang
5056bb9c01
rename aggregate to contents
2024-07-11 17:00:26 -07:00
Jeffrey Morgan
c4cf8ad559
llm: avoid loading model if system memory is too small ( #5637 )
...
* llm: avoid loading model if system memory is too small
* update log
* Instrument swap free space
On linux and windows, expose how much swap space is available
so we can take that into consideration when scheduling models
* use `systemSwapFreeMemory` in check
---------
Co-authored-by: Daniel Hiltgen <daniel@ollama.com >
2024-07-11 16:42:57 -07:00
Michael Yang
57ec6901eb
revert embedded templates to use prompt/response
...
This reverts commit 19753c18c0 .
for compat. messages will be added at a later date
2024-07-11 14:49:35 -07:00
Michael Yang
e64f9ebb44
do no automatically aggregate system messages
2024-07-11 14:49:35 -07:00
Jeffrey Morgan
791650ddef
sched: only error when over-allocating system memory ( #5626 )
v0.2.2-rc2
2024-07-11 00:53:12 -07:00
Jeffrey Morgan
efbf41ed81
llm: dont link cuda with compat libs ( #5621 )
v0.2.2-rc1
2024-07-10 20:01:52 -07:00
Michael Yang
cf15589851
Merge pull request #5620 from ollama/mxyng/templates
...
update embedded templates
2024-07-10 17:16:24 -07:00
Michael Yang
19753c18c0
update embedded templates
2024-07-10 17:03:08 -07:00
Michael Yang
41be28096a
add system prompt to first legacy template
2024-07-10 17:03:08 -07:00
Michael Yang
37a570f962
Merge pull request #5612 from ollama/mxyng/mem
...
chatglm graph
2024-07-10 14:18:33 -07:00
Michael Yang
5a739ff4cb
chatglm graph
2024-07-10 13:43:47 -07:00
Jeffrey Morgan
4e262eb2a8
remove GGML_CUDA_FORCE_MMQ=on from build ( #5588 )
2024-07-10 13:17:13 -07:00
Daniel Hiltgen
4cfcbc328f
Merge pull request #5124 from dhiltgen/amd_windows
...
Wire up windows AMD driver reporting
2024-07-10 12:50:23 -07:00
Daniel Hiltgen
79292ff3e0
Merge pull request #5555 from dhiltgen/msvc_deps
...
Bundle missing CRT libraries
2024-07-10 12:50:02 -07:00
Daniel Hiltgen
8ea500441d
Merge pull request #5580 from dhiltgen/cuda_overhead
...
Detect CUDA OS overhead
2024-07-10 12:47:31 -07:00
Daniel Hiltgen
b50c818623
Merge pull request #5607 from dhiltgen/win_rocm_v6
...
Bump ROCm on windows to 6.1.2
2024-07-10 12:47:10 -07:00
Daniel Hiltgen
b99e750b62
Merge pull request #5605 from dhiltgen/merge_glitch
...
Remove duplicate merge glitch
2024-07-10 11:47:08 -07:00
Daniel Hiltgen
1f50356e8e
Bump ROCm on windows to 6.1.2
...
This also adjusts our algorithm to favor our bundled ROCm.
I've confirmed VRAM reporting still doesn't work properly so we
can't yet enable concurrency by default.
2024-07-10 11:01:22 -07:00
Daniel Hiltgen
22c81f62ec
Remove duplicate merge glitch
2024-07-10 09:01:33 -07:00
Daniel Hiltgen
2d1e3c3229
Merge pull request #5503 from dhiltgen/dual_rocm
...
Workaround broken ROCm p2p copy
2024-07-09 15:44:16 -07:00
royjhan
4918fae535
OpenAI v1/completions: allow stop token list ( #5551 )
...
* stop token parsing fix
* add stop test
2024-07-09 14:01:26 -07:00
royjhan
0aff67877e
separate request tests ( #5578 )
2024-07-09 13:48:31 -07:00
Daniel Hiltgen
f6f759fc5f
Detect CUDA OS Overhead
...
This adds logic to detect skew between the driver and
management library which can be attributed to OS overhead
and records that so we can adjust subsequent management
library free VRAM updates and avoid OOM scenarios.
2024-07-09 12:21:50 -07:00
Daniel Hiltgen
9544a57ee4
Merge pull request #5579 from dhiltgen/win_static_deps
...
Statically link c++ and thread lib on windows
2024-07-09 12:21:13 -07:00
Daniel Hiltgen
b51e3b63ac
Statically link c++ and thread lib
...
This makes sure we statically link the c++ and thread library on windows
to avoid unnecessary runtime dependencies on non-standard DLLs
2024-07-09 11:34:30 -07:00
Michael Yang
6bbbc50f10
Merge pull request #5440 from ollama/mxyng/messages-templates
...
update named templates
2024-07-09 09:36:32 -07:00
Michael Yang
9bbddc37a7
Merge pull request #5126 from ollama/mxyng/messages
...
update message processing
2024-07-09 09:20:44 -07:00
Jeffrey Morgan
e4ff73297d
server: fix model reloads when setting OLLAMA_NUM_PARALLEL ( #5560 )
...
* server: fix unneeded model reloads when setting `OLLAMA_NUM_PARALLEL`
* remove whitespace change
* undo some changes
v0.2.1
2024-07-08 22:32:15 -07:00
Daniel Hiltgen
b44320db13
Bundle missing CRT libraries
...
Some users are experienging runner startup errors due
to not having these msvc redist libraries on their host
2024-07-08 18:24:21 -07:00
Daniel Hiltgen
0bacb30007
Workaround broken ROCm p2p copy
...
Enable the build flag for llama.cpp to use CPU copy for multi-GPU scenarios.
2024-07-08 09:40:52 -07:00
Jeffrey Morgan
53da2c6965
llm: remove ambiguous comment when putting upper limit on predictions to avoid infinite generation ( #5535 )
v0.2.0
2024-07-07 14:32:05 -04:00
Jeffrey Morgan
d8def1ff94
llm: allow gemma 2 to context shift ( #5534 )
v0.1.49-rc14
2024-07-07 13:41:51 -04:00
Jeffrey Morgan
571dc61955
Update llama.cpp submodule to a8db2a9c ( #5530 )
2024-07-07 13:03:09 -04:00