BERT explained: Training, Inference, BERT vs GPT/LLamA, Fine tuning, [CLS] token | Umar Jamil | Podwise