ollama

mirror of https://github.com/ollama/ollama.git synced 2026-04-26 18:55:53 +02:00

Author	SHA1	Message	Date
pufferffish	b6554e9b8c	fix vulkan handle releasing	2024-06-15 21:11:07 +01:00
DSLstandard	b958cd2848	remove cap_get_bound check	2024-06-15 20:19:19 +08:00
KOISHI KOMEIJI FROM TOUHOU 11	e3f9ca4009	fix check_perfmon len	2024-06-15 20:13:15 +08:00
pufferffish	38466f1821	fix build	2024-06-15 12:06:43 +01:00
pufferffish	18f3f960b0	update gpu.go	2024-06-15 12:05:01 +01:00
pufferffish	e77ea68e11	Merge branch 'refs/heads/main' into vulkan # Conflicts: # gpu/gpu.go	2024-06-15 12:01:36 +01:00
pufferffish	11c55fab81	fix total memory monitor	2024-06-15 10:58:12 +01:00
pufferffish	257364cb3c	fix free memory monitor	2024-06-15 10:52:34 +01:00
pufferffish	e4e8a5d25a	fix compilation	2024-06-15 09:44:10 +01:00
pufferffish	724fac470f	fix segfault	2024-06-15 08:05:48 +01:00
pufferffish	24c8840037	it builds	2024-06-15 07:49:28 +01:00
pufferffish	93c4d69daa	add support in gen_linux.sh	2024-06-15 05:42:59 +01:00
pufferffish	9c6b049567	add support in gpu.go	2024-06-15 05:27:14 +01:00
Daniel Hiltgen	532db58311	Merge pull request #4972 from jayson-cloude/main fix: "Skip searching for network devices"	2024-06-14 17:04:40 -07:00
Daniel Hiltgen	45cacbaf05	Merge pull request #4517 from dhiltgen/gpu_incremental Enhanced GPU discovery and multi-gpu support with concurrency	2024-06-14 15:35:00 -07:00
Daniel Hiltgen	17df6520c8	Remove mmap related output calc logic	2024-06-14 14:55:50 -07:00
Daniel Hiltgen	6f351bf586	review comments and coverage	2024-06-14 14:55:50 -07:00
Daniel Hiltgen	ff4f0cbd1d	Prevent multiple concurrent loads on the same gpus While models are loading, the VRAM metrics are dynamic, so try to load on a GPU that doesn't have a model actively loading, or wait to avoid races that lead to OOMs	2024-06-14 14:51:40 -07:00
Daniel Hiltgen	fc37c192ae	Refine CPU load behavior with system memory visibility	2024-06-14 14:51:40 -07:00
Daniel Hiltgen	434dfe30c5	Reintroduce nvidia nvml library for windows This library will give us the most reliable free VRAM reporting on windows to enable concurrent model scheduling.	2024-06-14 14:51:40 -07:00
Daniel Hiltgen	4e2b7e181d	Refactor intel gpu discovery	2024-06-14 14:51:40 -07:00
Daniel Hiltgen	48702dd149	Harden unload for empty runners	2024-06-14 14:51:40 -07:00
Daniel Hiltgen	68dfc6236a	refined test timing adjust timing on some tests so they don't timeout on small/slow GPUs	2024-06-14 14:51:40 -07:00
Daniel Hiltgen	5e8ff556cb	Support forced spreading for multi GPU Our default behavior today is to try to fit into a single GPU if possible. Some users would prefer the old behavior of always spreading across multiple GPUs even if the model can fit into one. This exposes that tunable behavior.	2024-06-14 14:51:40 -07:00
Daniel Hiltgen	6fd04ca922	Improve multi-gpu handling at the limit Still not complete, needs some refinement to our prediction to understand the discrete GPUs available space so we can see how many layers fit in each one since we can't split one layer across multiple GPUs we can't treat free space as one logical block	2024-06-14 14:51:40 -07:00
Daniel Hiltgen	206797bda4	Fix concurrency integration test to work locally This worked remotely but wound up trying to spawn multiple servers locally which doesn't work	2024-06-14 14:51:40 -07:00
Daniel Hiltgen	43ed358f9a	Refine GPU discovery to bootstrap once Now that we call the GPU discovery routines many times to update memory, this splits initial discovery from free memory updating.	2024-06-14 14:51:40 -07:00
Daniel Hiltgen	b32ebb4f29	Use DRM driver for VRAM info for amd The amdgpu drivers free VRAM reporting omits some other apps, so leverage the upstream DRM driver which keeps better tabs on things	2024-06-14 14:51:40 -07:00
Daniel Hiltgen	fb9cdfa723	Fix server.cpp for the new cuda build macros	2024-06-14 14:51:40 -07:00
Daniel Hiltgen	efac488675	Revert "Limit GPU lib search for now (#4777 )" This reverts commit `476fb8e892`.	2024-06-14 14:51:40 -07:00
Jeffrey Morgan	6b800aa7b7	openai: do not set temperature to 0 when setting seed (#5045 )	2024-06-14 13:43:56 -07:00
pufferffish	f46b4a6fa2	implement the vulkan C backend	2024-06-14 19:56:35 +01:00
Jeffrey Morgan	dd7c9ebeaf	server: longer timeout in `TestRequests` (#5046 )	2024-06-14 09:48:25 -07:00
Patrick Devine	4dc7fb9525	update 40xx gpu compat matrix (#5036 )	2024-06-13 17:10:33 -07:00
Daniel Hiltgen	c39761c552	Merge pull request #5032 from dhiltgen/actually_skip Actually skip PhysX on windows v0.1.44	2024-06-13 13:26:09 -07:00
Daniel Hiltgen	aac367636d	Actually skip PhysX on windows	2024-06-13 13:17:19 -07:00
Michael Yang	15a687ae4b	Merge pull request #5031 from ollama/mxyng/fix-multibyte-utf16 fix: multibyte utf16	2024-06-13 13:14:55 -07:00
Michael Yang	d528e1af75	fix utf16 for multibyte runes	2024-06-13 13:07:42 -07:00
Michael Yang	cd234ce22c	parser: add test for multibyte runes	2024-06-13 13:07:42 -07:00
Patrick Devine	94618b2365	add OLLAMA_MODELS to envconfig (#5029 )	2024-06-13 12:52:03 -07:00
Jeffrey Morgan	1fd236d177	server: remove jwt decoding error (#5027 )	2024-06-13 11:21:15 -07:00
Michael Yang	e87fc7200d	Merge pull request #5025 from ollama/mxyng/revert-parser-scan Revert "proper utf16 support"	2024-06-13 10:31:25 -07:00
Michael Yang	20b9f8e6f4	Revert "proper utf16 support" This reverts commit `66ab48772f`. this change broke utf-8 scanning of multi-byte runes	2024-06-13 10:22:16 -07:00
Patrick Devine	c69bc19e46	move OLLAMA_HOST to envconfig (#5009 )	2024-06-12 18:48:16 -04:00
Michael Yang	bba5d177aa	Merge pull request #5004 from ollama/mxyng/fix-templates fix: multiple templates when creating from model	2024-06-12 14:39:29 -07:00
Michael Yang	c16f8af911	fix: multiple templates when creating from model multiple templates may appear in a model if a model is created from another model that 1) has an autodetected template and 2) defines a custom template	2024-06-12 13:35:49 -07:00
Michael Yang	217f60c3d9	Merge pull request #4987 from ollama/mxyng/revert-byte-order Revert "Merge pull request #4938 from ollama/mxyng/fix-byte-order" v0.1.43	2024-06-11 16:04:20 -07:00
Michael Yang	7bdcd1da94	Revert "Merge pull request #4938 from ollama/mxyng/fix-byte-order" This reverts commit `f5f245cc15`, reversing changes made to `94d37fdcae`. this change broke gguf v2 which is incorrectly detected as big endian	2024-06-11 15:56:17 -07:00
Jeffrey Morgan	ead259d877	llm: fix seed value not being applied to requests (#4986 )	2024-06-11 14:24:41 -07:00
James Montgomery	2ff45d571d	Add Ollama-hpp to Community Libraries in README. (#4983 )	2024-06-11 11:15:05 -07:00

1 2 3 4 5 ...

2940 Commits