Double Q Learning is a model-free reinforcement learning method introduced by Hado van Hasselt in 2010 to address the overestimation bias in standard Q-learning. It improves stability by decoupling the Q-value estimation and action selection processes, ensuring more accurate policy evaluation.

Key Concepts

  • Original Problem: Q-learning tends to overestimate action values due to the use of the same Q-function for both target and action selection.
  • Solution: Double Q Learning employs two separate Q-networks—one for selecting actions and another for evaluating their values.
  • Advantages:
    • Reduces overestimation error
    • Enhances convergence in complex environments
    • Widely used in Deep Q-Networks (DQN)

Applications

This algorithm is foundational in deep reinforcement learning, particularly in scenarios like:

  • Game-playing agents (e.g., Atari games)
  • Robotics control
  • Autonomous systems

Further Reading

For deeper insights, explore:

Double_Q_Learning
Q_Function