Jeffrey Morgan
|
3490e9590b
|
model/qwen3next: avoid crash in in DeltaNet when offloading (#14541)
Co-authored-by: Yossi Ovadia <jabadia@gmail.com>
|
2026-03-01 18:44:04 -08:00 |
|
Jeffrey Morgan
|
8da09b1e7e
|
qwen3next: add compatibility with imported GGUF models (#14517)
|
2026-02-28 14:21:42 -08:00 |
|
Jeffrey Morgan
|
7f9efd53df
|
model: add support for qwen3.5-27b model (#14415)
|
2026-02-25 01:09:58 -08:00 |
|
Jeffrey Morgan
|
da70c3222e
|
model: support for qwen3.5 architecture (#14378)
|
2026-02-24 20:08:05 -08:00 |
|
Michael Yang
|
f1373193dc
|
move tokenizers to separate package (#13825)
|
2026-02-05 17:44:11 -08:00 |
|
Jeffrey Morgan
|
d25535c3f3
|
qwen3next: avoid inplace sigmoid for shared gate (#14077)
|
2026-02-04 15:50:02 -08:00 |
|
Jeffrey Morgan
|
255579aaa7
|
qwen3next: fix issue in delta net (#14075)
gDiffExp was being broadcast across the wrong axis when multiplying with k. This fix reshapes gDiffExp to [1, chunkSize, nChunks, ...]
|
2026-02-04 13:40:38 -08:00 |
|
Jeffrey Morgan
|
77eb2ca619
|
model: add qwen3-next architecture (#14051)
|
2026-02-03 23:27:21 -08:00 |
|