ZeRO & Fastest BERT: Increasing the scale and speed of deep learning training in DeepSpeed | Microsoft Research | Podwise