Reinforcement Learning with Human Feedback 101 | Vanishing Gradients | Podwise