In the context of Large Language Models (LLMs), such as ChatGPT or GPT-4, a token is a small unit of text used by the model to process and generate language. Tokens can represent a whole word, part of a word, punctuation, or even spaces—depending on the language and the tokenization method used.
Tool > https://platform.openai.com/tokenizer?utm_source=chatgpt.com
Tokenization: When a user inputs text, the LLM breaks this text into tokens before processing. This is known as tokenization. For instance, the sentence:
I heard a dog bark loudly at a cat
could be tokenized as: ["I", "heard", "a", "dog", "bark", "loudly", "at", "a", "cat"], with each word assigned a unique token ID. The text can then be represented as a sequence of numbers (e.g., [1, 2, 3, 4, 5, 6,3]
Sentence:
"I love AI."
Tokenization (GPT-style, subword-based):
["I", " love", " AI", "."]
➡️ 4 tokens
Sentence:
"ChatGPT is awesome!"
Tokens:
["Chat", "G", "PT", " is", " awesome", "!"]
➡️ 6 tokens
(Notice how "ChatGPT" is split into three tokens.)
Sentence:
"Learning artificial intelligence is fun."
Tokens:
["Learning", " artificial", " intelligence", " is", " fun", "."]
➡️ 6 tokens
Types of Tokens:
Word tokens: Each word is treated separately (“Hello”, “world”).
Subword tokens: Words are broken into meaningful parts (“unbreakable” → “un”, “break”, “able”).
Character tokens: Individual characters (used in some models).
Punctuation tokens: Marks like “!”, “,”.
Special tokens: Placeholders for beginnings, endings, or special features.
Example in Practice:
Sentence: "Hello, world!"
Tokens: ["Hello", ",", " world", "!"] (with GPT and similar models, spaces before punctuation can form a new token).
Another example: Wayne Gretzky’s “You miss 100% of the shots you don’t take” is split into 11 tokens. In English, a token is roughly four characters or three-quarters of a word, but the rule varies by language.
Understand Limits
Every AI model (like ChatGPT) has a limit on how much text it can handle at once, measured in tokens (not just words).
Example: GPT-4 may handle ~128,000 tokens (~100,000 words). If you paste too much, it won’t fit.
Cost Awareness
If you use AI tools that charge by tokens, your bill depends on the number of tokens (input + output).
Example: A short message = ~10 tokens; a long article = thousands of tokens.
Better Prompts
Knowing tokens helps you write concise prompts. Long, repetitive instructions = more tokens (costly + slower).
Clear, short prompts = fewer tokens, faster response.
Copy-Paste Planning
If you want to paste a whole PDF or long article into ChatGPT, token limits decide how much text can fit.
Sometimes you’ll need to split the text into chunks.