[short] AutoEval Done Right: Using Synthetic Data for Model Evaluation | Arxiv Papers | Podwise