This week, the Linux Foundation announced that the group will oversee the creation of a new Ethernet consortium focused on adapting and improving technology for high-performance computing workloads. Backed by founding members AMD, Arista, Broadcom, Cisco, Eviden, HPE, Intel, Meta, and Microsoft, the new Ultra Ethernet Consortium aims to meet the low-latency and scalability requirements of HPC and AI systems. We will continue to work on improving Ethernet. The group says that current Ethernet technology does not fully meet the challenge.
The new group’s top priority is to define and develop what it calls the Ultra Ethernet Transport (UET) protocol. This is a new transport layer protocol for Ethernet that better addresses the needs of AI and later his HPC workloads.
Ethernet is certainly one of the most ubiquitous technologies, but the demand for AI and HPC clusters is growing so fast that the technology will reach its limits in the future. The size of large AI models is increasing rapidly. GPT-3 he in 2020 he was trained with 175 billion parameters. Now GPT-4 is already said to be capable of he trillion parameters. Models with a large number of parameters require larger clusters, and these clusters send larger messages over the network. As a result, the higher the bandwidth and lower latency of these network functions, the more efficiently your cluster can operate.
“Many HPC and AI users find it difficult to get the most performance out of their systems due to weak interconnections in their systems,” said Dr. Earl Joseph, CEO of Hyperion Research.
At a high level, the new Ultra Ethernet Consortium aims to improve Ethernet in a surgical manner, improving and changing only those parts necessary to achieve its goals. Initially, the consortium looked to improve both the software and physical layers of Ethernet technology, but did not change the basic structure to ensure cost efficiency and interoperability.
The consortium’s technical goals include developing specifications, APIs, and source code to define protocols, interfaces, and data structures for Ultra Ethernet communications. Additionally, the consortium aims to update existing link and transport protocols and create new telemetry, signaling, security, and congestion mechanisms to better address the needs of large-scale AI and HPC clusters. . On the other hand, there are many differences between AI and HPC workloads, so UET has different profiles for proper deployment.
“Generative AI workloads require networks designed for supercomputing scale and performance,” said Justin Hotard, executive vice president and general manager, HPC & AI, Hewlett Packard Enterprise. “The importance of the Ultra Ethernet Consortium is to develop an open, scalable, and cost-effective Ethernet-based communications stack that can support and efficiently run these high-performance workloads. The operability gives customers choice and the performance to handle a variety of data-intensive workloads such as simulation and AI model training and tuning.”
The Ultra Ethernet Consortium is sponsored by the Linux Foundation, but the actual work is done by its members. Among the founders of AMD, Cisco, Intel, and others, all of these companies design high-performance CPUs, computing GPUs, and network infrastructure for AI and HPC workloads, or they We build supercomputers and clusters for , so we have a lot of experience with the right solutions. technology. The UEC work will be carried out by four working groups working on the physical layer, link layer, transport layer and software layer.
And while this group has not explicitly spoken of Ultra Ethernet in the context of competing technologies, members of the founding committee, i.e. who no A founding member says: Ultra Ethernet’s focus on performance goals and his HPC puts it in direct competition with InfiniBand, the networking technology of choice for his HPC-style networks with low latency for over a decade. NVIDIA is developed by its own industry body, but is said to have a great deal of influence over this group over his Mellanox acquisition a few years ago, making it one of the strangest of the new groups. The company makes extensive use of both Ethernet and his InfiniBand internally, using both in his scalable DGX SuperPod system.
Regarding the proposed Ultra Ethernet standard, UEC members are already planning how to integrate upcoming UET technology into their products.
AMD Chief Technology Officer (CTO) Mark Papermaster said: blog post. “UEC enables packet-spray delivery across multiple paths without causing congestion or head-of-line blocking. Data can now be successfully shared across clusters while minimizing the risk of data loss.Finally, UEC has built-in security for AI and HPC workloads, allowing AMD to leverage robust security and encryption capabilities. increase.”
Meanwhile, the UEC has not yet announced when it plans to finalize the UET specifications. This group is expected to seek certification from the IEEE, which governs various Ethernet standards, adding to the hurdle.
Finally, the UEC said it is looking for additional members to further enrich the group and will begin accepting applications for new members from the fourth quarter of 2023. Besides NVIDIA, there are several other big technology companies involved in AI and HPC efforts. So that would be your next best chance to join the consortium.