ollama-ollama

mirror of https://github.com/ollama/ollama.git synced 2026-04-17 15:53:27 +02:00

Author	SHA1	Message	Date
Jeffrey Morgan	54e05172a0	Revert "runner: add token history sampling parameters to ollama runner (#14537 )" (#14776 ) This reverts commit `86513cb697`.	2026-03-10 21:07:52 -07:00
Jeffrey Morgan	86513cb697	runner: add token history sampling parameters to ollama runner (#14537 )	2026-03-01 19:16:07 -08:00
Michael Yang	f1373193dc	move tokenizers to separate package (#13825 )	2026-02-05 17:44:11 -08:00
Michael Yang	a40d427bce	multi-regexp pretokenizer (#12325 )	2025-09-23 13:21:47 -07:00
Michael Yang	54055a6dae	fix test	2025-04-25 16:59:01 -07:00
Parth Sareen	a53d744b01	llama: remove model loading for grammar (#10096 )	2025-04-24 11:51:19 -07:00
Parth Sareen	42a14f7f63	sample: add error handling for empty logits (#9740 )	2025-03-20 11:11:18 -07:00
Jeffrey Morgan	e093db92c4	sample: temporarily use grammars for constrained generation in new engine (#9586 )	2025-03-10 16:17:39 +01:00
Parth Sareen	0682dae027	sample: improve ollama engine sampler performance (#9374 ) This change bring in various interface cleanups along with greatly improving the performance of the sampler. Tested with llama3.2 on local machine. Improves performance from ~ 70 tokens/s -> 135 tokens/s with topK(40) enabled. Without topK performance is ~ 110 tokens/s	2025-03-07 12:37:48 -08:00
Parth Sareen	c245b0406f	sample: remove transforms from greedy sampling (#9377 )	2025-02-27 15:44:53 -08:00
Parth Sareen	0b7e1676eb	sample: add sampling package for new engine (#8410 )	2025-02-24 17:19:01 -08:00