【Chris Hay】how the tokenizer for gpt-4 (tiktoken) works and why it can't reverse strings



Chris Hay :how the tokenizer for gpt-4 (tiktoken) works and why it can't reverse strings

how the tokenizer for gpt-4 (tiktoken) works and why it can't reverse strings

chris breaks down the chatgpt (gpt-4) tokenizer and shows why large language models such as gpt, llama-2 and mistral struggle to reverse words. chris looks at how words, programming languages, different languages and even how morse code is tokenized, and shows how tokenizers tend to be biased towards english languages and programming languages,