强化学习(Reinforcement Learning,简称RL)是机器学习的一个重要分支,本文将为您介绍一些强化学习的代码示例。
1. 简单环境搭建
在开始之前,您需要搭建一个简单的强化学习环境。以下是一个基于Python的简单环境搭建示例:
import gym
# 创建环境
env = gym.make('CartPole-v1')
# 运行环境
for _ in range(100):
env.reset()
for _ in range(200):
env.render()
action = env.action_space.sample()
obs, reward, done, _ = env.step(action)
if done:
break
2. Q-Learning算法
Q-Learning是一种常用的强化学习算法。以下是一个简单的Q-Learning示例:
import gym
import numpy as np
# 创建环境
env = gym.make('CartPole-v1')
# 初始化Q表
q_table = np.zeros([env.observation_space.n, env.action_space.n])
# 学习参数
alpha = 0.1 # 学习率
gamma = 0.6 # 折扣因子
epsilon = 0.1 # 探索率
# 训练
for _ in range(1000):
state = env.reset()
done = False
while not done:
if np.random.uniform() < epsilon:
action = env.action_space.sample()
else:
action = np.argmax(q_table[state])
next_state, reward, done, _ = env.step(action)
q_table[state, action] = q_table[state, action] + alpha * (reward + gamma * np.max(q_table[next_state]) - q_table[state, action])
state = next_state
# 保存Q表
np.save('q_table.npy', q_table)
3. 深度Q网络(DQN)
深度Q网络(Deep Q-Network,简称DQN)是强化学习中的一个重要进展。以下是一个简单的DQN示例:
import gym
import numpy as np
import tensorflow as tf
# 创建环境
env = gym.make('CartPole-v1')
# 定义网络
class DQN(tf.keras.Model):
def __init__(self):
super(DQN, self).__init__()
self.fc1 = tf.keras.layers.Dense(24, activation='relu')
self.fc2 = tf.keras.layers.Dense(24, activation='relu')
self.fc3 = tf.keras.layers.Dense(env.action_space.n)
def call(self, x):
x = self.fc1(x)
x = self.fc2(x)
return self.fc3(x)
# 训练
model = DQN()
optimizer = tf.keras.optimizers.Adam()
loss_fn = tf.keras.losses.MeanSquaredError()
for _ in range(1000):
state = env.reset()
done = False
while not done:
action = np.argmax(model(state))
next_state, reward, done, _ = env.step(action)
with tf.GradientTape() as tape:
q_values = model(state)
target_q = reward + gamma * np.max(model(next_state))
loss = loss_fn(target_q, q_values[:, action])
gradients = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
state = next_state
4. 扩展阅读
如果您想了解更多关于强化学习的内容,可以访问以下链接:
希望这些代码示例能帮助您更好地理解强化学习。😊