Proximal Policy Optimization for Cart-Pole Balancing

Jun 2024

Implemented Proximal Policy Optimization (PPO) for CartPole Balancing. Designed and developed a PPO-based reinforcement learning agent using Python and PyTorch, integrating Generalized Advantage Estimation (GAE) and clip loss. Set up a multi-environment training pipeline with Gym, achieving a cumulative reward of X within Y episodes. Improved training stability and efficiency, leveraging TensorBoard for detailed logging and Wandb for experiment tracking. GitHub Repository