How ChatGPT Understands You: Internal working of LLMs
Understanding AI can feel confusing, but it doesn’t have to be. This blog keeps things easy. No hard words, no complex diagrams, just simple explanations of how transformers work (the tech behind tools like ChatGPT and Google Translate).
Understanding AI can feel confusing, but it doesn’t have to be. This blog keeps things easy. No hard words, no complex diagrams, just simple explanations of how transformers work (the tech behind tools like ChatGPT and Google Translate).
If you're revising for exams, trying to learn the basics, or just curious about AI, this is for you. Everything is written like you’re learning from a friend, clear, quick, and to the point.
Let’s get started!
1. Transformers
Transformers is simple, you can imagine like give some input and get some output. It transforms your input into the desired output. First time use in Google Translate for converting one language to another. After that, some researchers find that we can train it and use it in other things.(like LLMs)

If you want to know further here the first article about it, Attention is All you Need
2. Encoder & Decoder
This is also very simple, Transformers does not understand human language, or it’s hard to understand so, Researchers come up with a really good idea of encoding & decoding. In this, you convert every alphabet or word(string) into mathematical numbers and give it to Transformers, then Transformers return mathematical numbers, and you convert them into real-world language.

3. Tokenization
Tokenization is breaking text into smaller pieces like words or symbols, so computers can understand language. Imagine cutting a sentence into Lego blocks: each block (token) represents a word or punctuation. These tokens help machines analyze, translate, or generate text by processing one piece at a time. first we create tokens and then encodings after that we give it to the Transformer.

Example - text = “Hello! How are you?” Tokenized text = ["Hello", "!", "How", "are", "you", "?"]
4. Vector Embeddings
Vector embedding is a technique in AI where items like words or images are represented as vectors in a multi-dimensional space. These vectors are crafted (usually by machine learning) so that similar items are near each other. This lets computers process and understand complex data based on similarities, making it a powerful tool for things like language understanding, image classification, or even recommending your next favorite movie!

Image credit - Pinecone
Imagine you want to represent different items, words like "apple," "banana," and "car" in a way that a computer can understand. Vector embedding does this by turning each item into a vector, which is just a list of numbers. These vectors act like coordinates, placing each item as a point in a multi-dimensional space (think of it like a map, but with many more directions than just north-south or east-west).
5. Positional Encoding
Positional encoding is a key ingredient in transformer models, allowing them to understand the order of elements in a sequence. By adding a unique, position-specific vector (calculated with sine and cosine functions) to each element’s embedding, the model gains the ability to process sequences effectively, even when looking at everything simultaneously. This is what makes transformers so powerful for tasks like language understanding, where the order of words is just as important as the words themselves

Think of positional encoding like page numbers in a book. Imagine you have a stack of loose pages with no order. Without page numbers, you’d struggle to know which page comes first, second, or third. Positional encoding acts as those page numbers, tagging each element in the sequence so the transformer can put them in the right order, even though it’s looking at all the "pages" at once.
6. Semantic Meaning
In India, “Yaar, traffic hai!” means friendship, but “Sir, traffic hai!” is formal. Semantic meaning is how AI grasps context. Transformers (like ChatGPT, Gemini) learn that “bank” could mean riverbank or a financial institution, depending on the sentence.
7. Self-Attention
Imagine you're reading the sentence: "The cat, which is fluffy, sat on the mat." To understand the word "sat," you need to connect it to "cat" (the subject) rather than "fluffy" or "mat." Self-attention enables the model to "look" at the entire sentence at once and determine that "cat" is the most relevant word for "sat," assigning it more focus. Unlike older methods that process sequences step-by-step (like recurrent neural networks, or RNNs), self-attention considers everything at the same time, making it faster and better at handling long-range connections.
8. SoftMax
LLMs work on predicting the next word, and every word has a priority called softmax. Softmax is like a magic helper in AI. It takes a list of numbers (scores) and turns them into percentages that add up to 100%. These percentages show how likely each option is, helping computers make smart choices like picking your favorite ice cream or deciding what’s in a picture. It’s simple, powerful, and super useful!
9. Multihead Attention
Multihead Attention is like having a team of specialists, each analyzing data from different angles, then combining their insights to make better decisions. This parallel, diversified approach is key to modern AI models’ success in understanding context and relationships in data.

10. Temperature
Temperature is how creative your model can be. LLMs work on predicting the next word and every word has a priority called softmax. If you set the temperature at “0“, it can simply pick the highest priority word, but if you increase the temperature, the probability of picking the lower priority word increases.

11. Knowledge Cutoff
Knowledge cutoff is the latest date a model’s training data includes (e.g., July 2025). Beyond this, it lacks awareness of new events, research, or trends. Answers about post-cutoff topics may be outdated or incorrect. Models can’t self-update like a frozen library unless retrained or paired with real-time data tools.
12. Vocab Size
Vocab size is the number of words/tokens AI knows. GPT-4o has ~200k tokens (words/subwords), and it’s growing in every model.
Conclusion
AI might sound complex, but at its core, it’s built on simple ideas like tokens, vectors, and attention. Understanding transformers helps you see how tools like ChatGPT work behind the scenes. Keep exploring, stay curious, and remember learning AI doesn’t need to be hard. You’ve already taken the first step toward mastering it!
A bit about me
Hi there! I’m Suprabhat, a curious mind who loves learning how things work and explaining them in simple ways. As a kid, I was fascinated by the internet and all its secrets. Now, I enjoy writing guides like this to help others understand our digital world. Thanks for reading, and keep exploring!
Social links
Enjoyed this article?
If you found this helpful, I'd appreciate it if you shared it with your network!