Inflection AI, a startup founded by the former head of DeepMind and backed by Microsoft and Nvidia, raised $1.3 billion in cash and cloud credit from industry giants last week. The company apparently plans to use the funding to build his cluster of supercomputers, powered by up to 22,000 of his H100 compute GPUs from Nvidia. This will have a theoretical maximum computing power performance comparable to frontier supercomputers.
DeepMind founder and Inflection AI co-founder Mustafa Suleyman reportedly said, “We plan to build a cluster of about 22,000 H100s.” Reuters. “This is about three times the amount of compute that was used to train all of GPT-4. It is the speed and scale that allows us to build differentiated products.”
A cluster with 22,000 Nvidia H100 compute GPUs can theoretically deliver 1.474 exaflops of FP64 performance. This is with Tensor Cores. A typical FP64 code running on a CUDA core only halves the peak throughput to 0.737 FP64 exaflops. On the other hand, the world’s fastest supercomputer frontier, the peak computational performance is 1.813 FP64 exaflops (doubled to 3.626 exaflops for matrix operations). This puts him in second place for planned new computers for now, but could drop him to fourth after El Capitan and Aurora are fully online.
FP64 performance is important for many scientific workloads, but AI-oriented tasks can be much faster on this system. FP16/BF16 has a peak throughput of 43.5 exaflops, while FP8 doubles its throughput to 87.1 exaflops. Powered by 37,888 AMD Instinct MI250X, the Frontier supercomputer has a peak throughput of 14.5 exaflops at BF16/FP16.
The cost of the cluster is unknown, but bearing in mind that Nvidia’s H100 compute GPUs retail for over $30,000 per unit, we expect GPUs for the cluster to cost in the hundreds of millions of dollars. Add in all the rack servers and other hardware and you’ve got the bulk of the $1.3 billion funding.
Inflection AI is currently valued at approximately $4 billion, approximately one year after its founding. Its only current product is a generational AI chatbot called Pi, short for personal intelligence. Pi is designed to act as an AI-powered personal assistant with generative AI technology similar to ChatGPT to help you plan, schedule and gather information. This allows the Pi to communicate with the user through dialogue, allowing the user to ask questions and provide feedback. In particular, Inflection AI outlined specific user experience goals for the Pi, such as providing emotional support.
Inflection AI currently operates a cluster based on 3,584 Nvidia H100 compute GPUs in the Microsoft Azure cloud. The proposed supercomputing cluster offers about 6x performance over current cloud-based solutions.