Transformer模型(1/2): 剥离RNN,保留Attention | Shusen Wang | Podwise