mirror of
https://github.com/karpathy/nanoGPT.git
synced 2026-04-20 00:23:23 +02:00
Before: length of dataset in characters: 1115394 all the unique characters: !$&',-.3:;?ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz vocab size: 65 train has 1003854 tokens val has 111540 tokens After: length of dataset in characters: 1,115,394 all the unique characters: !$&',-.3:;?ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz vocab size: 65 train has 1,003,854 tokens val has 111,540 tokens
tiny shakespeare, character-level
Tiny shakespeare, of the good old char-rnn fame :) Treated on character-level.
After running prepare.py:
- train.bin has 1,003,854 tokens
- val.bin has 111,540 tokens