Introduction to Large Language Models
https://developers.google.com/machine-learning/crash-course/llm
LLMs: What is a Large Language Model?
An LLM is a predictive technology that estimates the next “token” (word, character, or subword) in a sequence. They outperform older models (like N-grams) because they use vastly more parameters and can process significantly more context at once.

Transformer
It is most successful and widely used architecture that is used to build LLMs. It is the state-of-the-art achitecture used in various language model applications.
A full transformer consists of:
- Encoder
- Decoder

Encoder only or Decoder only architectures also exist.
Self Attention
Self-attention allows the model to understand the relationship between words in a sentence, regardless of distance.

Multi-Head Attention
LLMs stack multiple “heads” of attention. Each head focuses on different aspects of language, one might track grammar, while another tracks pronoun references. By stacking these layers, the model builds a complex, abstract understanding of the text.

How LLMs Generate Text
Functionally, LLMs are sophisticated autocomplete engines. When you ask a question, the model views it as the first part of a sequence and calculates the most probable “completion” (the answer), token by token.
Other Resources:
- “LoRA vs. QLoRA,” Redhat.com, 2025. https://www.redhat.com/en/topics/ai/lora-vs-qlora
- “LoRA vs. QLoRA: Efficient fine-tuning techniques for LLMs,” Modal, 2024. https://modal.com/blog/lora-qlora
- GeeksforGeeks, “FineTuning using LoRA and QLoRA,” GeeksforGeeks, Jun. 20, 2025. https://www.geeksforgeeks.org/deep-learning/fine-tuning-using-lora-and-qlora/
- Shalini Dhote, “Parameter-Efficient Fine-Tuning of Large Language Models with LoRA and QLoRA,” Analytics Vidhya, Aug. 27, 2023. https://www.analyticsvidhya.com/blog/2023/08/lora-and-qlora/.
- “LoRA,” Huggingface.co, 2018. https://huggingface.co/docs/peft/en/package_reference/lora