Microsoft Built One of the World's Top-5 Supercomputers to Train Enormous AI Models

Recently, Microsoft has announced that it has built one of the top five publicly disclosed supercomputers in the world, making new infrastructure available in Azure to train extremely large artificial intelligence (AI) models, the company announced.

The supercomputer developed is a single system with more than 285,000 CPU cores, 10,000 GPUs and 400 gigabits per second of network connectivity for each GPU server.

The supercomputer is built by Microsoft in partnership with and exclusively for OpenAI, a San Francisco-based non-profit research organization that aims to make human-friendly AI. OpenAI counts Elon Musk as one of its founders. In July last year, Microsoft invested $1 billion in OpenAI to jointly develop new super-computing technologies on Microsoft's Azure cloud computing platform.

Compared with other supercomputers listed on the TOP500 supercomputers in the world, it ranks in the top five, said Microsoft. Hosted in Azure, the supercomputer also benefits from all the capabilities of a robust modern cloud infrastructure, including rapid deployment, sustainable datacenters and access to Azure services.

It is Microsoft's first step toward making the next generation of very large AI models and the infrastructure needed to train them available as a platform for other organizations and developers to build upon.

As part of its 'AI at Scale' initiative, Microsoft has built its own family of large AI models, which it calls the Microsoft Turing models, which it has used to improve many different language understanding tasks across Bing, Office, Dynamics and other productivity products. Earlier this year, the company also released to researchers the largest publicly available AI language model in the world, the Microsoft Turing model for natural language generation.

Microsoft is also exploring large-scale AI models that can learn in a generalized way across text, images and video. That could help with automatic captioning of images for accessibility in Office, for instance, or improve the ways people search Bing by understanding what’s inside images and videos.

The goal, Microsoft says, is to make its large AI models, training optimization tools and supercomputing resources available through Azure AI services and GitHub so developers, data scientists and business customers can easily leverage the power of AI at Scale.

"As we've learned more and more about what we need and the different limits of all the components that make up a supercomputer, we were really able to say, 'If we could design our dream system, what would it look like?'" said OpenAI CEO Sam Altman. And then Microsoft was able to build it.

"We are seeing that larger-scale systems are an important component in training more powerful models," Altman added.

Microsoft has also unveiled a new version of DeepSpeed, an open-source deep learning training optimization library, and ZeRO (Zero Redundancy Optimizer), a novel memory optimization technology in the library, which vastly advances large model training by improving scale, speed, cost, and usability.

The update is significantly more efficient than the version released just three months ago and now allows people to train models more than 15 times larger and 10 times faster than they could without DeepSpeed on the same infrastructure.

Mega Menu

TRENDING

Slider

Microsoft Built One of the World's Top-5 Supercomputers to Train Enormous AI Models

MUST READ

LATEST

POPULAR

DON'T MISS

USEFUL

RESOURCES