Stratified Prediction-Powered Inference for Hybrid Language Model Evaluation | Best AI papers explained | Podwise