Direct Nash Optimization: Teaching language models to self-improve with.. | Microsoft Research Forum | Microsoft Research | Podwise