Jeffrey Morgan
54e05172a0
Revert "runner: add token history sampling parameters to ollama runner ( #14537 )" ( #14776 )
...
This reverts commit 86513cb697 .
2026-03-10 21:07:52 -07:00
Jeffrey Morgan
86513cb697
runner: add token history sampling parameters to ollama runner ( #14537 )
2026-03-01 19:16:07 -08:00
Michael Yang
f1373193dc
move tokenizers to separate package ( #13825 )
2026-02-05 17:44:11 -08:00
Michael Yang
a40d427bce
multi-regexp pretokenizer ( #12325 )
2025-09-23 13:21:47 -07:00
Michael Yang
54055a6dae
fix test
2025-04-25 16:59:01 -07:00
Parth Sareen
a53d744b01
llama: remove model loading for grammar ( #10096 )
2025-04-24 11:51:19 -07:00
Parth Sareen
42a14f7f63
sample: add error handling for empty logits ( #9740 )
2025-03-20 11:11:18 -07:00
Jeffrey Morgan
e093db92c4
sample: temporarily use grammars for constrained generation in new engine ( #9586 )
2025-03-10 16:17:39 +01:00
Parth Sareen
0682dae027
sample: improve ollama engine sampler performance ( #9374 )
...
This change bring in various interface cleanups along with greatly improving the performance of the sampler.
Tested with llama3.2 on local machine.
Improves performance from ~ 70 tokens/s -> 135 tokens/s with topK(40) enabled.
Without topK performance is ~ 110 tokens/s
2025-03-07 12:37:48 -08:00
Parth Sareen
c245b0406f
sample: remove transforms from greedy sampling ( #9377 )
2025-02-27 15:44:53 -08:00
Parth Sareen
0b7e1676eb
sample: add sampling package for new engine ( #8410 )
2025-02-24 17:19:01 -08:00