An 8-chapter journey from the foundations of sequential decision-making to modern deep RL. Master MDPs, dynamic programming, Monte Carlo and TD learning, Q-learning, DQN, policy gradients, and actor-critic methods like PPO.
Weeks 1-3
The vocabulary, the MDP formalism, and planning with a known model
Weeks 4-5
Learning value functions directly from experience
Week 6
Function approximation, experience replay, target networks
Weeks 7-8
Optimizing policies directly, up to modern PPO
Learning from interaction, the agent-environment loop, rewards and returns, policies and value functions, and the exploration-exploitation tradeoff.
Formalizing sequential decisions: states, actions, transition dynamics, the Markov property, return, and the Bellman equations.
Solving known MDPs with planning: policy evaluation, policy improvement, policy iteration, value iteration, and generalized policy iteration.
Learning from experience without a model: Monte Carlo prediction, TD(0), bootstrapping, and the bias-variance tradeoff between them.
Model-free control with action-value methods: SARSA (on-policy), Q-learning (off-policy), epsilon-greedy exploration, and convergence.
Scaling RL with function approximation: neural network value functions, experience replay, target networks, and DQN improvements.
Optimizing policies directly: the policy gradient theorem, REINFORCE, baselines and variance reduction, and continuous action spaces.
Combining value and policy learning: actor-critic architectures, advantage estimation (GAE), A2C, and Proximal Policy Optimization.
Curriculum inspired by Sutton & Barto's Reinforcement Learning: An Introduction, taking you from RL foundations to modern deep reinforcement learning.