Building a small Transformer language model in PyTorch from the ground up, then running a controlled ablation study to measure the contribution of each "modern" component (RMSNorm, RoPE, SwiGLU) ...