Meta Partners with NVIDIA to Build the Most Powerful AI Supercomputer

Meta Partners with NVIDIA to Build the Most Powerful AI Supercomputer

Meta Partners with NVIDIA to Build the Most Powerful AI Supercomputer




The Meta AI Supercomputer, the largest NVIDIA DGX A100-based client system to date, delivers 5 exaflops of performance with cutting-edge NVIDIA computing, InfiniBand networking, and thousands of GPU optimization software.

Meta Platforms has chosen NVIDIA technologies to create the most powerful AI computing system to date.

Announced today, the AI Research SuperCluster (RSC) is already training new models as it continues to improve AI algorithms.

Once fully deployed, the Meta RSC is expected to be the largest NVIDIA DGX A100 based client system.

“We hope that the RSC cluster will help us create completely new artificial intelligence systems that can, for example, provide real-time voice translation for large groups of people who speak different languages so that they can work together on research projects or play games with augmented reality,” the company said in a blog post.





Training the biggest AI models

When the RSC is fully built - later this year - Meta plans to use it to train AI models with over a trillion parameters. This can improve areas such as natural language processing for tasks such as real-time detection of malicious content.

In addition to scalable performance, Meta cited exceptional reliability, security, privacy, and flexibility to work with "a wide range of AI models" as key RSC criteria.

Meta's RSC Cluster is built with hundreds of NVIDIA DGX systems networked using NVIDIA Quantum InfiniBand switches to speed up research teams.

Under the hood

The new AI supercomputer uses 760 NVIDIA DGX A100 systems as compute nodes. They contain a total of 6080 NVIDIA A100 GPUs networked with NVIDIA Quantum 200Gb/s InfiniBand, delivering TF32 performance of 1895 petaflops.

Despite the challenges caused by COVID-19, it took only 18 months to go from an idea on paper to a working AI supercomputer, thanks also to NVIDIA DGX A100 technologies.

20x performance boost

This is the second time Meta has chosen NVIDIA technologies as the foundation for the company's research infrastructure. In 2017, Meta built the first generation of this AI research infrastructure, powered by 22,000 NVIDIA V100 Tensor Core GPUs that run about 35,000 AI training jobs daily.

Preliminary Meta tests have shown that RSC can train large NLP models 3x faster and perform computer vision tasks 20x faster than a 2017 system.

In the second phase, RSC will expand to 16,000 GPUs, which Meta believes will deliver 5 exaflops of AI performance in mixed-precision operations. Meta also plans to expand the RSC storage to exabytes at 16 terabytes per second.

Scalable architecture

NVIDIA AI technologies are available to companies of all sizes.

The NVIDIA DGX System, which includes NVIDIA's full AI software stack, scales easily from a single system to a DGX SuperPOD cluster running locally or through providers. Customers can also rent DGX systems through the NVIDIA DGX Foundry.

Tags nvidia
Start cooperation
Fill out the form and we will contact you within 3 business days.
Name of the company *
Manufacturer country *
Name *
Position *
Telephone *
E-mail *
How did you hear about us?
Message: *
Message *