ollama

mirror of https://github.com/ollama/ollama.git synced 2026-04-21 16:25:42 +02:00

Author	SHA1	Message	Date
Patrick Devine	e790dc435b	mlx: int4 groupsize 64 (#14682 ) Change affine 4bit integers to use groupsize 64	2026-03-06 16:39:47 -08:00
Patrick Devine	e9f6ea232f	Add qwen3.5-next-moe support to MLX runner and models (#14417 ) This change adds support for qwen3.5-next-moe models (qwen3-next/qwen3.5-next/qwen3-coder) to the MLX runner. It also: * introduces recurrent cache support and related MLX ops * updates pipeline/runner integration and adds tests * properly quantizes stacked expert tensors * a Gated Delta Metal kernel for fast SSM inference * adds new MLX calls for Conv1d, DepthwideConv1d, Contiguous, Exp, Log, SoftmaxAxis	2026-03-03 16:39:22 -08:00
Patrick Devine	a0407d07fa	safetensors quantization for mlx (#14184 ) This change includes: - changes to the safetensors metadata format - changes to the create command to properly create the blobs with the new format - changes to load the new format - fixes ollama show to properly show each tensor	2026-02-10 11:29:17 -08:00
Patrick Devine	d8cc798c2b	glm 4.7 flash support on experimental engine (#13838 )	2026-02-02 15:22:11 -08:00
Patrick Devine	a077d996e3	Fix `create` and `show` commands for experimental models (#13741 ) * x: make `ollama create --experimental` import from safetensors This change allows pulling in safetensors models into the new experimental model format, and also fixes the `ollama show` command to be able to correctly display the model information. * gofumpt the linter * gofumpt the linter again * validate the model name	2026-01-16 14:31:55 -08:00

5 Commits