Chinese Firms Foil US AI Sanctions With Older GPUs, Software Tweaks
After losing access to Nvidia’s state-of-the-art A100 and H100 computing GPUs, which can be used to train various AI models, the Chinese firm must find a way to train them without using state-of-the-art hardware. had. To compensate for the lack of powerful GPUs, Chinese AI model developers are instead simplifying their programs to reduce requirements and use all available computing hardware in combination. wall street journal report.
Nvidia cannot sell its A100 and H100 computing GPUs to Chinese companies like Alibaba and Baidu unless it obtains an export license from the US Department of Commerce (an application that will almost certainly be denied). That’s why Nvidia developed his A800 and H800 processors with degraded performance and his faulty NVLink feature. This limits the ability to build high-performance multi-GPU systems traditionally required for training large AI models.
For example, UBS analysts estimate that 5,000 to 10,000 Nvidia’s A100 GPUs are required to train the large language model behind OpenAI’s ChatGPT, the WSJ reports. Chinese developers don’t have access to the A100, so he uses a combination of the A800 and H800 to achieve performance on par with Nvidia’s high-performance GPUs, said a professor at the National University of Singapore. says Yang You, founder of HPC. -AI technology. In April, Tencent introduced a new computing cluster using his Nvidia H800 for large-scale AI model training. This approach can be costly because a Chinese company might need three times as much an H800 as a US company would need an H100 to get similar results. There is a nature.
Due to the high cost and unavailability of all the necessary GPUs physically, Chinese companies have designed a way to train large-scale AI models on different types of chips. This is rarely done by US-based companies due to technical challenges and reliability concerns. For example, according to a WSJ-reviewed research paper, companies like Alibaba, Baidu and Huawei are considering using Nvidia’s A100, V100, P100 and Huawei’s Ascend in combination.
There are many companies in China developing processors for AI workloads, but their hardware is not supported by a robust software platform like Nvidia’s CUDA.
In addition, Chinese companies are also actively working on combining various software techniques to reduce the computational requirements needed to train large-scale AI models, but this approach is still gaining popularity globally. Is not … Despite challenges and ongoing refinements, Chinese researchers have seen some success with these methods.
In a recent paper, Huawei researchers demonstrated training the latest generation of large-scale language models, PanGu-Σ, using only Ascend processors and no Nvidia computing GPUs. Despite some shortcomings, this model achieved state-of-the-art performance on several Chinese language tasks, such as reading comprehension and grammar tests.
Analysts warn that Chinese researchers will face increasing difficulties without access to Nvidia’s new H100 chip. It comes with additional performance enhancements that are especially useful for training models like ChatGPT. Meanwhile, a paper published last year by Baidu and the Peng Cheng Laboratory showed that researchers were training large-scale language models using methods that could make additional features irrelevant.
Dylan Patel, chief analyst at SemiAnalysis, was quoted as saying, “If it works well, it can effectively evade sanctions.”