User:Philip.Zman/sandbox
Appearance
![]() | This user page or section is in a state of significant expansion or restructuring. You are welcome to assist in its construction by editing it as well. If this user page has not been edited in several days, please remove this template. If you are the editor who added this template and you are actively editing, please be sure to replace this template with {{in use}} during the active editing session. Click on the link for template parameters to use.
This page was last edited by Philip.Zman (talk | contribs) 3 years ago. (Update timer) |
Part of a series on |
Machine learning and data mining |
---|
Proximal Policy Optimization is a family of model-free reinforcement learning algorithms for learning a mapping from states to actions, also referred to as a policy. Similarly to Trust Region Policy Optimization they are based on the policy gradient algorithm and try to extend it by allowing multiple gradient update steps per data sample. It was proposed in 2017 by OpenAI scientists.
Policy Gradient Methods
[edit]Algorithm
[edit]Test