Multi Agent Reinforcement Learning

What is an Agent?

The three core characteristics of an autonomous system.

Autonomous Entity

It operates independently as a self-directed entity, without requiring constant human intervention or manual control.

Observes Environment

It continuously monitors its surroundings, gathering real-time state data and information about the world around it.

Chooses How to Act

Based entirely on those observations, it makes independent decisions and calculates the optimal action to take.

Multi-Agent Systems (MAS)

A Multi-Agent System is more than just a collection of robots; it is a collaborative network where several autonomous entities share a common environment to achieve complex goals.

Shared Space Agents interact within the same physical or digital boundaries.

Collective Intelligence The group solves problems that are too large for a single agent.

Agent Interaction Dynamics

How multiple autonomous entities behave toward one another in a shared environment.

Collaborative 🤝

Cooperative

Agents work together as a unified team to reach a common goal. Success is shared, and individual actions are optimized to maximize the total group reward.

Competitive ⚔️

Adversarial

A zero-sum environment where agents compete. Each entity focuses on maximizing its own personal benefit while simultaneously working to minimize the performance of opponents.

Hybrid 🔄

Mixed

The most complex scenario where both cooperation and competition exist. Agents may form temporary alliances or compete for limited resources while working toward a broader objective.

Designing the Agent’s Logic

How do we decide what an agent does? There are two primary schools of thought.

Rule-Based Design

Direct Encoding

Humans determine the behavior ahead of time. We impart our knowledge directly into the agent using explicit code (e.g., “If red light, then stop”). This is predictable and reliable but doesn’t scale well to complex edge cases.

Reinforcement Learning

Self-Learning

Agents learn behaviors on their own through trial and error. They interact with the environment to maximize a reward signal. This allows for the discovery of highly efficient, complex strategies that humans might never think of.

Feature	Encoded Behaviors	Learned Behaviors
Logic Source	Human Programmer	Reward & Interaction
Predictability	High (Deterministic)	Lower (Evolving)
Adaptability	Low (Manual Updates)	High (Self-adjusting)
Best Use Case	Simple, Safety-Critical	Complex, Fluid Tasks

The RL Loop

01. Placement

Agent exists within an environment.

02. Perception

Agent observes environment States.

03. Decision

Policy determines the next Action.

04. Interaction

Action updates the environment state.

05. Feedback

Rewards granted (may be sparse/delayed).

06. Optimization

Algorithm updates policy to maximize total reward.

Reinforcement Learning

Agent exists within an environment
Agent abserves the states of the enviroment
Agent policy decides which action to take
Action affects the environment state
Reward may be granted based on the new state and action pair (Reward can be sparse, only receive after number of sequential actions)
RL algorithm update the agent policy overtime to maximize the reward.

With Multi-Agent RL, we have multiple agent interacting with the environment.

MARL approches

Decentralized: Each agent trained independently to other agents; no information shared between agents; Simplified design with no communication; Not aware of other agents actions or what they have done.
Centralized: Higher level of collection of shared information/experiences; learning policy from all experiences and sharing

Decentralized

Independent Training: Each agent learns entirely on its own.
No Communication: Zero information sharing between entities.
Simplified Design: No complex hardware for data transfer.
Unaware: Agents are blind to the actions and history of others.

Centralized

Data Collection: Higher-level gathering of all agent experiences.
Shared Wisdom: A single policy learns from the total collective data.
Coordinated Learning: All agents benefit from each other’s successes.
Global View: Decisions are made using the full context of the environment.

References

[1] MATLAB, “Introduction to Multi-Agent Reinforcement Learning,” YouTube. 2022.

[2] L. Buşoniu, R. Babuška, and B. De Schutter, “Multi-Agent Reinforcement Learning: An Overview,” Springer. 2010.
Available at bartdeschutter.org/publications/…

‌