Daniel Hiltgen
539741199e
mlx: perf improvements (#14768)
* mlx: perf improvements
Fix nn.go to call mlx_fast_layer_norm instead of manually implementing (mean,
subtract, variance, rsqrt, multiply, add — 6 ops)
Fix llama.go, gemma3.go to remove RepeatKV to tile K/V tensors to match the Q
head count, since scaled_dot_product_attention natively handles GQA (it just
requires n_q_heads % n_kv_heads == 0)
* review comments
2026-03-12 12:01:28 -07:00
..
2026-03-09 19:27:59 -07:00
2026-03-09 17:24:45 -07:00
2026-02-10 14:57:57 -08:00
2026-03-09 17:24:45 -07:00
2026-03-09 17:24:45 -07:00
2026-03-09 17:24:45 -07:00
2026-03-09 17:24:45 -07:00
2026-03-09 17:24:45 -07:00
2026-02-20 23:46:07 -08:00
2026-03-12 12:01:28 -07:00
2026-03-09 17:24:45 -07:00
2026-03-09 17:24:45 -07:00
2026-03-09 17:24:45 -07:00
2026-03-09 17:24:45 -07:00
2026-02-23 16:44:29 -08:00
2026-03-09 17:24:45 -07:00
2026-03-09 17:24:45 -07:00
2026-03-09 19:27:59 -07:00
2026-03-09 17:24:45 -07:00
2026-03-12 12:01:28 -07:00
2026-03-09 17:24:45 -07:00
2026-03-09 17:24:45 -07:00
2026-03-09 17:24:45 -07:00
2026-03-09 17:24:45 -07:00