If you want to teach an Artificial Intelligence to play a video game, control a robot, or optimize a trading strategy, you need an environment for it to practice in. Enter Gymnasium (formerly OpenAI Gym), the standard API for single-agent Reinforcement Learning (RL).

1. The Agent-Environment Loop

Reinforcement learning is fundamentally about trial and error. We don’t program the agent with specific instructions; instead, we drop it into an environment and let it learn from the consequences of its actions.

🤖Agent

Action (A_t) ➡️

⬅️ State (S_t)

⬅️ Reward (R_t)

🌍Environment

2. Your First RL Program

Gymnasium provides four primary functions that allow you to build out this continuous loop: make(), reset(), step(), and render(). Here is what a basic implementation looks like using the classic CartPole balancing environment:

import gymnasium as gym

# 1. Initialize the environment
env = gym.make("CartPole-v1", render_mode="human")

# 2. Reset the environment to start the episode
observation, info = env.reset()

episode_over = False
total_reward = 0

while not episode_over:
    # 3. Choose a random action (0 = Left, 1 = Right)
    action = env.action_space.sample()  

    # 4. Take the step and observe the results
    observation, reward, terminated, truncated, info = env.step(action)

    total_reward += reward
    
    # Check if the pole fell (terminated) or hit the time limit (truncated)
    episode_over = terminated or truncated

print(f"Episode finished! Total reward: {total_reward}")
env.close()

💡 Beginner Tip: If you run the code above, the cart will flail randomly and the pole will immediately fall over. This is expected! We are using env.action_space.sample(). True machine learning happens when you replace that line with a trained Neural Network policy.

3. Action and Observation Spaces

How does the agent know what it is allowed to do? Every environment comes with predefined boundaries called Spaces.

Observation Space: What the agent can “see”. For CartPole, this is a Box containing 4 continuous numbers (position, velocity, angle, angular velocity). For a racing game, it might be a grid of RGB pixels.
Action Space: What the agent can “do”. For CartPole, this is Discrete(2), representing exactly two available buttons: push left or push right.

Notebook Example:

Gymnasium_ReinforcementLearning (1)

In [9]:

!pip install gymnasium[classic-control]
!pip install moviepy imageio

Requirement already satisfied: gymnasium[classic-control] in /usr/local/lib/python3.12/dist-packages (1.2.3)
Requirement already satisfied: numpy>=1.21.0 in /usr/local/lib/python3.12/dist-packages (from gymnasium[classic-control]) (2.0.2)
Requirement already satisfied: cloudpickle>=1.2.0 in /usr/local/lib/python3.12/dist-packages (from gymnasium[classic-control]) (3.1.2)
Requirement already satisfied: typing-extensions>=4.3.0 in /usr/local/lib/python3.12/dist-packages (from gymnasium[classic-control]) (4.15.0)
Requirement already satisfied: farama-notifications>=0.0.1 in /usr/local/lib/python3.12/dist-packages (from gymnasium[classic-control]) (0.0.4)
Requirement already satisfied: pygame>=2.1.3 in /usr/local/lib/python3.12/dist-packages (from gymnasium[classic-control]) (2.6.1)
Requirement already satisfied: moviepy in /usr/local/lib/python3.12/dist-packages (1.0.3)
Requirement already satisfied: imageio in /usr/local/lib/python3.12/dist-packages (2.37.3)
Requirement already satisfied: decorator<5.0,>=4.0.2 in /usr/local/lib/python3.12/dist-packages (from moviepy) (4.4.2)
Requirement already satisfied: tqdm<5.0,>=4.11.2 in /usr/local/lib/python3.12/dist-packages (from moviepy) (4.67.3)
Requirement already satisfied: requests<3.0,>=2.8.1 in /usr/local/lib/python3.12/dist-packages (from moviepy) (2.32.4)
Requirement already satisfied: proglog<=1.0.0 in /usr/local/lib/python3.12/dist-packages (from moviepy) (0.1.12)
Requirement already satisfied: numpy>=1.17.3 in /usr/local/lib/python3.12/dist-packages (from moviepy) (2.0.2)
Requirement already satisfied: imageio-ffmpeg>=0.2.0 in /usr/local/lib/python3.12/dist-packages (from moviepy) (0.6.0)
Requirement already satisfied: pillow>=8.3.2 in /usr/local/lib/python3.12/dist-packages (from imageio) (11.3.0)
Requirement already satisfied: charset_normalizer<4,>=2 in /usr/local/lib/python3.12/dist-packages (from requests<3.0,>=2.8.1->moviepy) (3.4.6)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.12/dist-packages (from requests<3.0,>=2.8.1->moviepy) (3.11)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.12/dist-packages (from requests<3.0,>=2.8.1->moviepy) (2.5.0)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.12/dist-packages (from requests<3.0,>=2.8.1->moviepy) (2026.2.25)

In [12]:

import gymnasium as gym
from gymnasium.wrappers import RecordVideo
from IPython.display import Video
import glob

# 1. Initialize with rgb_array so it renders to memory, not a screen
base_env = gym.make("CartPole-v1", render_mode="rgb_array")

# 2. Wrap the environment to automatically record MP4 videos
# We tell it to save in a folder named "video" and record every episode
env = RecordVideo(base_env, video_folder="./video", episode_trigger=lambda x: True)

observation, info = env.reset()
episode_over = False
total_reward = 0

while not episode_over:
    action = env.action_space.sample()  # Random action
    observation, reward, terminated, truncated, info = env.step(action)

    total_reward += reward
    episode_over = terminated or truncated

print(f"Episode finished! Total reward: {total_reward}")

# 3. Close the environment (this finalizes and saves the MP4 file)
env.close()

# 4. Find the saved MP4 file and display it inside the notebook
video_file = glob.glob("./video/*.mp4")[0] # Grabs the first video in the folder
Video(video_file, embed=True, html_attributes="controls loop autoplay")

/usr/local/lib/python3.12/dist-packages/gymnasium/wrappers/rendering.py:293: UserWarning: WARN: Overwriting existing videos at /content/video folder (try specifying a different `video_folder` for the `RecordVideo` wrapper if this is not desired)
  logger.warn(

Episode finished! Total reward: 31.0

Out[12]:

In [10]:

import gymnasium as gym

# 1. Initialize the environment
env = gym.make("CartPole-v1", render_mode="human")

# 2. Reset the environment to start the episode
observation, info = env.reset()

episode_over = False
total_reward = 0

while not episode_over:
    # 3. Choose a random action (0 = Left, 1 = Right)
    action = env.action_space.sample()

    # 4. Take the step and observe the results
    observation, reward, terminated, truncated, info = env.step(action)

    total_reward += reward

    # Check if the pole fell (terminated) or hit the time limit (truncated)
    episode_over = terminated or truncated

print(f"Episode finished! Total reward: {total_reward}")
env.close()

Episode finished! Total reward: 33.0

Github Link: https://github.com/computingnotes/Gymnasium_RefinforcementLearning

Reference

[1] “Gymnasium Documentation,” Farama.org, 2025. https://gymnasium.farama.org/introduction/basic_usage/