Match the ollamarunner and OpenAI semantics: raw, full-vocab log-softmax
with the top-K ranked by probability. Skipped on the GPU when the request
doesn't ask for logprobs so decode doesn't pay for it otherwise.
* gemma4: implement Gemma 4 model for MLX (text-only runtime)
* gemma4: two MoE + SWA prefill perf fixes
Two performance optimizations in the gemma4 forward pass
1. Memoize the sliding-window prefill mask across layers.
2. Softmax only over the selected experts in Router.Forward.
* review comments