Reinforcement Learning for LLMs

In this course, you’ll learn all about how Reinforcement Learning, Deep RL, and how it is used to train and fine-tune Large Language Models, including:

Deep RL, the Bellman Equation, and Policy Gradients
- Reinforcement Learning with Human Feedback (RLHF)
Proximal Policy Optimization (PPO)
Direct Preference Optimization (DPO)
Group Relative Policy Optimization (GRPO)

Start Course Continue Course Retake Course Preview Course Learn More

Chapter 1

5 Lessons

Retake this course?

Retaking this course from the beginning will reset all of your tracked progress.