Reinforcement Learning – Gymnasium

If you want to teach an Artificial Intelligence to play a video game, control a robot, or optimize a trading strategy, you need an environment for it to practice in. Enter Gymnasium (formerly OpenAI Gym), the standard API for single-agent Reinforcement Learning (RL).

1. The Agent-Environment Loop

Reinforcement learning is fundamentally about trial and error. We don’t program the agent with specific instructions; instead, we drop it into an environment and let it learn from the consequences of its actions.

🤖Agent
Action (A_t) ➡️
⬅️ State (S_t)
⬅️ Reward (R_t)
🌍Environment

2. Your First RL Program

Gymnasium provides four primary functions that allow you to build out this continuous loop: make(), reset(), step(), and render(). Here is what a basic implementation looks like using the classic CartPole balancing environment:

import gymnasium as gym

# 1. Initialize the environment
env = gym.make("CartPole-v1", render_mode="human")

# 2. Reset the environment to start the episode
observation, info = env.reset()

episode_over = False
total_reward = 0

while not episode_over:
    # 3. Choose a random action (0 = Left, 1 = Right)
    action = env.action_space.sample()  

    # 4. Take the step and observe the results
    observation, reward, terminated, truncated, info = env.step(action)

    total_reward += reward
    
    # Check if the pole fell (terminated) or hit the time limit (truncated)
    episode_over = terminated or truncated

print(f"Episode finished! Total reward: {total_reward}")
env.close()
💡 Beginner Tip: If you run the code above, the cart will flail randomly and the pole will immediately fall over. This is expected! We are using env.action_space.sample(). True machine learning happens when you replace that line with a trained Neural Network policy.

3. Action and Observation Spaces

How does the agent know what it is allowed to do? Every environment comes with predefined boundaries called Spaces.

  • Observation Space: What the agent can “see”. For CartPole, this is a Box containing 4 continuous numbers (position, velocity, angle, angular velocity). For a racing game, it might be a grid of RGB pixels.
  • Action Space: What the agent can “do”. For CartPole, this is Discrete(2), representing exactly two available buttons: push left or push right.

Notebook Example:

Gymnasium_ReinforcementLearning (1)

Github Link: https://github.com/computingnotes/Gymnasium_RefinforcementLearning

Reference

[1] “Gymnasium Documentation,” Farama.org, 2025. https://gymnasium.farama.org/introduction/basic_usage/

Leave a Reply

Your email address will not be published. Required fields are marked *

error: