Resource Dump

Dec 20, 2018

All Resources

DeepMimic

[Paper] DeepMimic: Example-Guided Deep Reinforcement Learning of Physics-Based Character Skills
- https://xbpeng.github.io/projects/DeepMimic/index.html
Proximal Policy Optimization Algorithms
- https://arxiv.org/abs/1707.06347
A Comprehensive Guide to Machine Learning
- http://snasiriany.me/files/ml-book.pdf
Xue Bin (Jason) Peng: Towards a Virtual Stuntman
- https://bair.berkeley.edu/blog/2018/04/10/virtual-stuntman/
SIGGRAPH 2018: DeepMimic paper (main video)
- https://youtu.be/vppFvq2quQ0

Policy Gradients

(DDPG) Continuous Control with Deep Reinforcement Learning
- https://arxiv.org/abs/1509.02971
Stack Exchange: When can we interchange integration and differentiation?
- https://math.stackexchange.com/questions/2530213/when-can-we-interchange-integration-and-differentiation
Daniel Takeshi: Going Deeper Into Reinforcement Learning: Fundamentals of Policy Gradients
- https://danieltakeshi.github.io/2017/03/28/going-deeper-into-reinforcement-learning-fundamentals-of-policy-gradients/

On Policy vs. Off Policy

Stack Overflow explanation of On vs Off Policy
- https://stats.stackexchange.com/a/184794
Simple Explanation of On-Policy Q-Learning and Off-Policy SARSA
- https://www.cse.unsw.edu.au/~cs9417ml/RL1/algorithms.html
Action policies: -greedy, -soft, softmax
- https://www.cse.unsw.edu.au/~cs9417ml/RL1/tdlearning.html#aselection
Q-Learning and SARSA Algorithm pseudo code:
- https://towardsdatascience.com/introduction-to-various-reinforcement-learning-algorithms-i-q-learning-sarsa-dqn-ddpg-72a5e0cb6287

Textbooks

Sutton and Barto’s “Reinforcement Learning: An Introduction” Book
- http://incompleteideas.net/book/bookdraft2017nov5.pdf
A Comprehensive Guide to Machine Learning
- http://snasiriany.me/files/ml-book.pdf