Improving Transformers with Dynamically Composable Multi-Head Attention | Xiaol.x | Podwise