mirror of
https://github.com/ollama/ollama.git
synced 2026-04-23 01:05:47 +02:00
Add experimental MLX backend and engine with imagegen support (#13648)
* WIP - MLX backend with gemma3 * MLX: add cmake and go tag build toggles To build the new MLX backend code: cmake --preset MLX cmake --build --preset MLX --parallel cmake --install build --component MLX go build -tags mlx . Note: the main.go entrypoint for the MLX engine will change in a follow up commit. * add experimental image generation runtime * add experimental image generation runtime * MLX: wire up cuda build for linux * MLX: get dependencies correct and dedup This is still too large for a unified github artifact, but is now "correct" for the mlx_cuda_v13 directory. * fix relative link bug in dedup * Add darwin build and readme * add go build tag for mlx dependent code and wire up build_darwin.sh * lint cleanup * macos: build mlx for x86 This will be CPU only. * cuda build instructions and fix drift from mlx bump * stale comment * Delete agent helper doc * Clean up readme.md * Revise README for tokenizer clarity and details Updated README to clarify tokenizer functionality and removed correctness section. --------- Co-authored-by: jmorganca <jmorganca@gmail.com>
This commit is contained in:
85
x/imagegen/tokenizer/README.md
Normal file
85
x/imagegen/tokenizer/README.md
Normal file
@@ -0,0 +1,85 @@
|
||||
# Tokenizer
|
||||
|
||||
Tokenizer for LLM inference supporting BPE, SentencePiece, and WordPiece algorithms. The goal of this package is to see if a pure Go tokenizer can be fast and correct. It primarily supports the `imagegen` models however it (or parts of it) could be considered to replace Ollama's tokenizer in the `model` package.
|
||||
|
||||
## Features
|
||||
|
||||
- **BPE (Byte Pair Encoding)** - GPT-2/Llama style with byte-level encoding
|
||||
- **SentencePiece** - Gemma style with `▁` space handling
|
||||
- **WordPiece** - BERT style with `##` continuation tokens
|
||||
- **Parallel encoding** - Automatic parallelization for inputs >4KB
|
||||
- **HuggingFace compatible** - Loads `tokenizer.json` directly
|
||||
|
||||
## Usage
|
||||
|
||||
```go
|
||||
import "github.com/ollama/ollama/x/imagegen/tokenizer"
|
||||
|
||||
// Load from HuggingFace model directory
|
||||
tok, err := tokenizer.Load("./weights/Llama-3.2-1B")
|
||||
if err != nil {
|
||||
log.Fatal(err)
|
||||
}
|
||||
|
||||
// Encode text to token IDs
|
||||
ids := tok.Encode("Hello, world!", false) // false = don't add BOS
|
||||
|
||||
// Decode back to text
|
||||
text := tok.Decode(ids)
|
||||
|
||||
// Check special tokens
|
||||
if tok.IsEOS(ids[len(ids)-1]) {
|
||||
// End of sequence
|
||||
}
|
||||
```
|
||||
|
||||
## Performance
|
||||
|
||||
Benchmarks on Apple M3 Max:
|
||||
|
||||
| Input Size | Encode | Decode | Tokens |
|
||||
|------------|--------|--------|--------|
|
||||
| 1 KB | 14.5 MB/s | 267 MB/s | 231 |
|
||||
| 10 KB | 10.9 MB/s | 321 MB/s | 2,301 |
|
||||
| 100 KB | 8.9 MB/s | 311 MB/s | 23,001 |
|
||||
| 1 MB | 9.6 MB/s | 321 MB/s | 230,001 |
|
||||
|
||||
Comparison with other implementations (10 MB input):
|
||||
|
||||
| Implementation | Encode Speed | Notes |
|
||||
|----------------|--------------|-------|
|
||||
| Engine (this) | ~10 MB/s | stdlib RE2, parallel >4KB |
|
||||
| tiktoken (Rust) | ~17 MB/s | Highly optimized regex |
|
||||
| Ollama (Go) | ~2-3 MB/s | regexp2 backtracking |
|
||||
|
||||
## Performance Opportunities
|
||||
|
||||
Potential optimizations not yet implemented:
|
||||
|
||||
| Optimization | Expected Gain | Complexity |
|
||||
|--------------|---------------|------------|
|
||||
| Aho-Corasick for special tokens | 2-3x for many special tokens | Medium |
|
||||
| Custom regex engine (like tiktoken) | 1.5-2x | High |
|
||||
| SIMD byte scanning | 1.3-1.5x for pretokenizer | Medium |
|
||||
| Assembly BPE merge loop | 1.2-1.5x | High |
|
||||
| Memoization for repeated substrings | Variable | Low |
|
||||
|
||||
Current bottleneck is the pretokenizer regex (~60% of encode time). tiktoken achieves ~17 MB/s with a hand-tuned Rust regex engine.
|
||||
|
||||
## Not Yet Implemented
|
||||
|
||||
| Feature | Used By | Notes |
|
||||
|---------|---------|-------|
|
||||
| Unigram tokenizer | T5, ALBERT, mBART | Different algorithm (not BPE) |
|
||||
| Unicode normalizers | Some multilingual models | NFD, NFKC, lowercase, etc. |
|
||||
| Custom pretokenizers | Model-specific | Beyond standard patterns |
|
||||
|
||||
Most HuggingFace models use BPE or SentencePiece, which are fully supported. WordPiece (BERT-style) is also supported with standard `[UNK]` fallback for out-of-vocabulary characters.
|
||||
|
||||
## Files
|
||||
|
||||
| File | Description |
|
||||
|------|-------------|
|
||||
| `tokenizer.go` | Main implementation (~1000 lines) |
|
||||
| `tokenizer_test.go` | Tests and benchmarks |
|
||||
| `testdata/` | Mini tokenizer for unit tests |
|
||||
Reference in New Issue
Block a user