强化学习基础代码示例 🤖

什么是强化学习？

强化学习是机器学习的一个分支，通过试错机制让智能体在与环境的交互中学习最优策略。核心要素包括：

智能体（Agent）
环境（Environment）
状态（State）
动作（Action）
奖励（Reward）

Q-learning 基础实现

import numpy as np
import gym

env = gym.make('CartPole-v1')
Q_table = np.zeros([env.observation_space.high[0]+1, env.action_space.n])

# 学习参数
alpha = 0.1  # 学习率
gamma = 0.99 # 折扣因子
epsilon = 0.1 # 探索率

# 价值迭代算法
for episode in range(1000):
    state = env.reset()
    done = False
    while not done:
        # ε-greedy 策略
        if np.random.uniform(0,1) < epsilon:
            action = env.action_space.sample()
        else:
            action = np.argmax(Q_table[state[0], :])
        
        next_state, reward, done, _ = env.step(action)
        Q_table[state[0], action] = Q_table[state[0], action] + alpha * (reward + gamma * np.max(Q_table[next_state[0], :]) - Q_table[state[0], action])
        
        state = next_state

深度强化学习示例

使用 TensorFlow 实现 DQN：

import tensorflow as tf
from tensorflow.keras import layers

# 构建神经网络
model = tf.keras.Sequential([
    layers.Dense(24, activation='relu', input_shape=(env.observation_space.shape[0],)),
    layers.Dense(24, activation='relu'),
    layers.Dense(env.action_space.n, activation='linear')
])

扩展阅读

想深入了解强化学习进阶内容？可以查看深度强化学习实战教程获取更多代码示例和算法解析。