1. 基础概念

Markov Decision Process (MDP)

  • Sets

    • State: S\mathcal{S}

    • Action: A(s)\mathcal{A}(s)

    • Reward: R(s,a)\mathcal{R}(s,a)

  • Probability distribution

    • State transition: p(ss,a)p(s'|s,a)

    • Reward: p(rs,a)p(r|s,a)

  • Policy: π(as)\pi(a|s)

  • Markov property

Last updated