Deep Reinforcement Learning

Resources:

  • ā€œWelcome to the šŸ¤— Deep Reinforcement Learning Course – Hugging Face Deep RL Course,ā€Ā Huggingface.co, 2018. https://huggingface.co/learn/deep-rl-course/en/unit0/introduction (accessed Feb. 21, 2026).
  • ā€œWelcome to Spinning Up in Deep RL! — Spinning Up documentation,ā€Ā Openai.com, 2018. https://spinningup.openai.com/en/latest/ (accessed Feb. 21, 2026).’
  • ā€œCS 185/285,ā€Ā Berkeley.edu, 2023. https://rail.eecs.berkeley.edu/deeprlcourse/ (accessed Feb. 21, 2026).
  • A. Plaat, ā€œDeep Reinforcement Learningā€.
šŸ›’ The Supermarket Analogy [Plaat et al. Page 25]

Imagine you have just moved to a new city, you are hungry, and you want to buy some groceries. There is an unrealistic catch: you have no map and no smartphone. After some random exploration, you find a supermarket. You carefully note the route in your notebook and return home.

What will you do next time? You could exploit your current knowledge and follow the same path – it’s guaranteed to work. Or, you could be adventurous and explore, trying to find a quicker route. This is the classic Exploration-Exploitation trade-off.

Agent: You
Environment: The City
State: Your Location
Action: Moving a block
Reward: Path length/time
Policy: Your decision logic

What is Deep Reinforcement Learning?

At its core, Deep Reinforcement Learning (DRL) is a subfield of machine intelligence that combines two heavy hitters: Reinforcement Learning (RL) and Deep Learning (DL).

1. The Core Components

To understand DRL, we first have to look at the standard Reinforcement Learning loop. It’s essentially a “trial-and-error” framework where an Agent learns to make decisions.

  • Agent: The AI “player” or decision-maker.
  • Environment: The world the agent lives in (e.g., a video game, a stock market, or a robotic arm).
  • State (s): The current situation or “snapshot” of the environment.
  • Action (a): What the agent chooses to do.
  • Reward (r): The feedback (positive or negative) given to the agent based on its action.
The Reinforcement Learning Loop
Agent
(The AI Player)
āž”
Action
Environment
(The World)
State & Reward
⇠

2. Why “Deep”?

In traditional RL, we use simple tables (like a spreadsheet) to map states to the best actions. This works for Tic-Tac-Toe, but it fails in the real world. Imagine a self-driving car; the number of possible “states” (camera pixels, sensor data, speed) is infinite. We can’t fit that in a table.

This is where Deep Learning comes in. We use Neural Networks as “function approximators.” Instead of looking up a value in a table, the agent passes the state through a deep neural network to predict which action will yield the highest long-term reward.

Traditional RL

  • Uses Q-Tables
  • Hand-crafted features
  • Limited complexity

Deep RL

  • Uses Neural Networks
  • Raw data (Pixels/Sensors)
  • End-to-level learning

3. How It Learns: The Goal

The agent’s goal isn’t just to get an immediate reward, but to maximize the cumulative reward over time, often called the Return.

Because future rewards are less certain than immediate ones, we use a discount factor ($\gamma$, typically between 0 and 1) to weight them. This is often expressed via the Bellman Equation:

Q(s,a)=r+γmaxa′⁔Q(s′,a′)Q(s, a) = r + \gamma \max_{a’} Q(s’, a’)

In DRL, the neural network learns to estimate this $Q$ value (the “quality” of an action) for every possible scenario.

THE BELLMAN EQUATION
Q(s, a) = r + γ max Q(s’, a’)
Immediate Reward + Discounted Future Value

4. Famous Examples

Feature Traditional RL Deep RL
State Space Small/Discrete (Tables) High-dimensional (Pixels)
Brain Q-Tables Neural Networks
Scalability Simple Games Complex/Real-world

Leave a Reply

Your email address will not be published. Required fields are marked *

error: