ollama

starred/ollama

Fork 0

mirror of https://github.com/ollama/ollama.git synced 2026-04-17 21:54:08 +02:00

Commit Graph

Author	SHA1	Message	Date
Daniel Hiltgen	de9673ac3f	tokenizer: add byte fallback for SentencePiece BPE encoding (#15232 ) * tokenizer: add byte fallback for SentencePiece BPE encoding When BPE merging produces tokens not in the vocabulary, fall back to encoding each UTF-8 byte as <0xHH> byte tokens instead of silently dropping the character. Also teach Decode to convert <0xHH> tokens back to raw bytes. Fixes #15229, fixes #15231 * tokenizer fixes	2026-04-02 13:04:45 -07:00

Author

SHA1

Message

Date

Daniel Hiltgen

de9673ac3f

tokenizer: add byte fallback for SentencePiece BPE encoding (#15232 )

* tokenizer: add byte fallback for SentencePiece BPE encoding

When BPE merging produces tokens not in the vocabulary, fall back to
encoding each UTF-8 byte as <0xHH> byte tokens instead of silently
dropping the character. Also teach Decode to convert <0xHH> tokens
back to raw bytes.

Fixes #15229, fixes #15231

* tokenizer fixes

2026-04-02 13:04:45 -07:00

1 Commits