LessWrong (30+ Karma) - “Model Spec Midtraining: Improving How Alignment Training Generalizes” by Chloe Li, saraprice, Sam Marks, Jonathan Kutasov
Sign in to continue reading, translating and more.