mirror of
https://github.com/ollama/ollama.git
synced 2026-04-17 15:53:27 +02:00
Move tokenization out of the single GPU processing goroutine and into each request's HTTP handler goroutine. This allows the next request's prompt to be tokenized on the CPU while the current request is executing on the GPU.