1. 基础概念
Markov Decision Process (MDP)
Sets
State:
Action:
Reward:
Probability distribution
State transition:
Reward:
Policy:
Markov property
Last updated
Sets
State:
Action:
Reward:
Probability distribution
State transition:
Reward:
Policy:
Markov property
Last updated