Q-Learning is a popular algorithm used in reinforcement learning to learn optimal policies. It is a value-based method that uses a Q-table to learn the value of taking a certain action in a certain state.
Key Concepts
- State: A state represents the current situation or configuration of the environment.
- Action: An action is a decision made by the agent to transition from one state to another.
- Reward: A reward is a scalar value that indicates how good or bad an action is.
- Q-Table: A Q-table is a table that stores the expected value of taking a certain action in a certain state.
How Q-Learning Works
- Initialize Q-Table: The Q-table is initialized with random values.
- Choose an Action: The agent chooses an action based on the current state and the Q-table.
- Take Action: The agent takes the chosen action and transitions to a new state.
- Observe Reward: The agent observes the reward received from taking the action.
- Update Q-Table: The Q-table is updated based on the reward and the maximum expected value of the next state.
Example
Let's say we have a simple environment where the agent can move up, down, left, or right. The agent receives a reward of +1 for moving to a higher number and -1 for moving to a lower number.
Here's an example of how the Q-table might be updated after a few iterations:
| State | Up | Down | Left | Right |
|-------|----|------|------|-------|
| 1 | 0 | 0 | 0 | 0 |
| 2 | 0 | 0 | 0 | 0 |
| 3 | 0 | 0 | 0 | 0 |
| 4 | 0 | 0 | 0 | 0 |
After a few iterations, the Q-table might look like this:
| State | Up | Down | Left | Right |
|-------|----|------|------|-------|
| 1 | 1 | -1 | -1 | -1 |
| 2 | 1 | -1 | -1 | -1 |
| 3 | 1 | -1 | -1 | -1 |
| 4 | 1 | -1 | -1 | -1 |
This means that the agent has learned that the best action in each state is to move up.
Learn More
For more information on Q-Learning, check out our Introduction to Reinforcement Learning guide.
Q-Learning