Blog
Our progress on the science of scaling, architecture and systems for scientific world models
NVFP4 Pretraining: Systems Optimizations (Part 2)
Jerome Ku and others at Radical Numerics · January 12, 2026
A deep dive into TransformerEngine's implementation of the NVFP4 recipe: nvfp4 data flow, custom kernels, and systems optimizations for performant FP4 training.
Continue readingNVFP4 Pretraining: From Theory to Implementation (Part 1)
Jerome Ku, Michael Poli, and others at Radical Numerics · January 12, 2026
NVIDIA's NVFP4 pretraining recipe: floating point fundamentals, the evolution of mixed-precision training, and the techniques that enable stable NVFP4 training.
Continue readingPhalanx: Hardware-Aligned Sliding-Window Recurrences
Garyk Brixi, Dragos Secrieru, Stefano Massaroli and others at Radical Numerics · October 14, 2025
Phalanx: a drop-in replacement for sliding window attention. Faster and better quality. Hardware- and numerics-aware design of new layers that push the efficiency-quality frontier
Continue readingRND1: Simple, Scalable AR-to-Diffusion Conversion
Keshigeyan Chandrasegaran, Armin W. Thomas and others at Radical Numerics · October 9, 2025
Introducing RND1-30B, the largest open-source diffusion language model, trained via AR-to-Diffusion conversion with a simple, scalable recipe.
Continue reading