Training Diffusion Models using Reinforcement Learning

May 2024 - present

Under the guidance of Prof. Biplab Banerjee, I am working on training diffusion models using reinforcement learning (RL). This project builds upon the work by Kevin Black, Michael Janner, Yilun Du, Ilya Kostrikov, and Sergey Levine, described in their paper “Training Diffusion Models with Reinforcement Learning”. The core objective is to optimize diffusion models for downstream tasks directly, rather than merely matching a data distribution. This involves framing denoising as a multi-step decision-making problem and applying policy gradient algorithms, referred to as denoising diffusion policy optimization (DDPO).