Reinforcement Learning for LLMs
In this course, you’ll learn all about how Reinforcement Learning, Deep RL, and how it is used to train and fine-tune Large Language Models, including:
Deep RL, the Bellman Equation, and Policy Gradients
Reinforcement Learning with Human Feedback (RLHF)
Proximal Policy Optimization (PPO)
Direct Preference Optimization (DPO)
Group Relative Policy Optimization (GRPO)
- 
      
Chapter 1
- 
      
        
          
          
          
Lesson 1: Deep Reinforcement Learning and Policy Gradients
Deep Reinforcement Learning and Policy Gradientes
 - 
      
        
          
          
          
Lesson 2: Reinforcement Learning with Human Feedback (RLHF)
Reinforcement Learning with Human Feedback (RLHF)
 - 
      
        
          
          
          
Lesson 3: Principal Policy Optimization (PPO)
Principal Policy Optimization (PPO)
 - 
      
        
          
          
          
Lesson 4: Direct Preference Optimization (DPO)
Direct Preference Optimization (DPO)
 - 
      
        
          
          
          
Lesson 5: Group Relative Policy Optimization (GRPO)
Group Relative Policy Optimization (GRPO)
 
 - 
      
        
          
          
          
 
        Retake this course?
      
      
        Retaking this course from the beginning will reset all of your tracked progress.