Jump to content

Markov decision process

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by 207.111.236.2 (talk) at 02:35, 2 November 2004. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff)

A Markov Decision Process (MDP) is a discrete time stochastic control process characterized by a set of states, actions, and transition probability matrices that depend on the actions chosen within a given state. MDPs are extremely useful for studying a wide range of optimization problems solved via dynamic programming and reinforcement learning.

References

  • Bellman, R. E. Dynamic Programming. Princeton University Press, Princeton, NJ.
  • M. L. Puterman. Markov Decision Processes. Wiley, 1994.