mirror of
https://github.com/ollama/ollama.git
synced 2026-04-21 08:15:42 +02:00
Our default behavior today is to try to fit into a single GPU if possible. Some users would prefer the old behavior of always spreading across multiple GPUs even if the model can fit into one. This exposes that tunable behavior.