Apr 18, 2026 Training Compute-Optimal Large Language Models Apr 18, 2026 Learning and Leveraging World Models in Visual Representation Learning Apr 02, 2026 FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness