AI Testing and Evaluation: Reflections | Microsoft Research | Podwise