Jump to content

Markov decision process

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by 207.111.236.2 (talk) at 02:35, 2 November 2004. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff)

A Markov Decision Process (MDP) is a discrete time stochastic control process characterized by a set of states, actions, and transition probability matrices that depend on the actions chosen within a given state. MDPs are extremely useful for studying a wide range of optimization problems solved via dynamic programming and reinforcement learning.

External links

MDP Toolbox for Matlab - An excellent tutorial and Matlab toolbox for working with MDPs.

References

Bellman, R. E. Dynamic Programming. Princeton University Press, Princeton, NJ.
M. L. Puterman. Markov Decision Processes. Wiley, 1994.

Retrieved from "https://en.wikipedia.org/w/index.php?title=Markov_decision_process&oldid=7051636"

Machine learning