Sets
State: S\mathcal{S}S
Action: A(s)\mathcal{A}(s)A(s)
Reward: R(s,a)\mathcal{R}(s,a)R(s,a)
Probability distribution
State transition: p(s′∣s,a)p(s'|s,a)p(s′∣s,a)
Reward: p(r∣s,a)p(r|s,a)p(r∣s,a)
Policy: π(a∣s)\pi(a|s)π(a∣s)
Markov property
Last updated 10 months ago