Optimizing Training Workloads on GPU Clusters | Together AI | Podwise