
Researchers developed a compiler called Mirage Persistent Kernel (MPK) that automatically transforms large language model (LLM) inference into a high-performance megakernel, reducing latency by 1.2-6.7x. MPK's end-to-end GPU fusion approach eliminates kernel launch overhead and maximally overlaps computation, data loading, and inter-GPU communication across layers.