* tokenizer: add byte fallback for SentencePiece BPE encoding
When BPE merging produces tokens not in the vocabulary, fall back to
encoding each UTF-8 byte as <0xHH> byte tokens instead of silently
dropping the character. Also teach Decode to convert <0xHH> tokens
back to raw bytes.
Fixes#15229, fixes#15231
* tokenizer fixes