shradhasehgal 20 hours ago

Super interesting work. Wild that AF3 launched 100x more kernels. 768 tokens length training results seem cool.