Deep Reinforcement Learning Parameters

Epsilon (ϵepsilon 𝜖): This is a hyperparameter (a value between 0 and 1) that defines the probability of an agent taking a random action (exploration) instead of the action believed to have the highest reward (exploitation).

A high $epsilon$ value (e.g., 1.0) means the agent primarily explores randomly, useful in early training when it has little knowledge.
A low $epsilon$ value (e.g., 0.01) means the agent primarily exploits its learned knowledge, useful as training progresses and the agent’s policy becomes more refined.

Epsilon Decay: This is the process of gradually reducing the value of 𝜖 over time or a number of episodes/steps during training. The goal is to allow the agent to explore extensively in the initial stages and then shift its focus to exploiting the optimal policy it has learned as training progresses, leading to stable and efficient convergence. This dynamic adjustment helps the algorithm avoid getting stuck in local optima.

Epsilon Min: In deep reinforcement learning, epsilon min (𝜖𝑚𝑖𝑛) refers to the minimum value that the exploration rate (𝜖) is allowed to reach during the training process.

Importance of epsilon min:

Prevents Over-Exploitation: During training, the $epsilon$ 𝜖 value is typically decayed over time (epsilon decay) from an initial high value (e.g., 1.0) to encourage initial exploration. Setting a minimum value for $epsilon$ 𝜖 ensures that the agent never stops exploring completely.
Continuous Learning: By maintaining a small, non-zero probability of taking a random action (even after many episodes), the agent can continue to discover new paths or better solutions that it might have missed otherwise. This helps prevent the agent from getting permanently stuck in a sub-optimal local maximum.
Adapting to Dynamic Environments: In environments where the optimal policy might change over time, ongoing exploration (guaranteed by $epsilon sub m i n end-sub$ ) allows the agent to adapt to these changes.

Typical values for 𝜖𝑚𝑖𝑛 are very small, commonly ranging between 0.01 and 0.001. The specific value is a hyperparameter that often needs to be tuned for a specific problem to find the right balance for continuous learning without making the agent’s behavior too random during later stages of training.

https://spinningup.openai.com/en/latest/spinningup/rl_intro.html

https://huggingface.co/learn/deep-rl-course/en/unit3/deep-q-algorithm

https://www.sciencedirect.com/science/article/pii/S0925231225024427

Related