Imagen-NVIDIA's Blackwell Architecture: 25x Performance Boost Reshaping AI Computation

Imagen-NVIDIA's Blackwell Architecture: 25x Performance Boost Reshaping AI Computation

Introduction

At the 2024 GTC (GPU Technology Conference), NVIDIA CEO Jensen Huang took the stage without a script or teleprompter to deliver what many are calling "the Super Bowl of AI." In a packed stadium-like venue in San Jose, Huang unveiled NVIDIA's vision for the future of computing, highlighting how AI has evolved from perception to generation and now to reasoning and physical understanding. This keynote marks a pivotal moment in computing history, as NVIDIA revealed not only its next-generation Blackwell architecture but also a comprehensive roadmap for AI infrastructure that will power everything from cloud data centers to enterprise computing and robotics.

Key Points

  • NVIDIA Blackwell offers 25x more performance per watt than Hopper, revolutionizing AI inference
  • Agentic AI requires 100x more computational power than previously anticipated
  • NVIDIA Dynamo is a new operating system for managing AI factories
  • New partnerships with GM, Cisco, and T-Mobile will expand AI to automotive and edge computing
  • DGX Spark and DGX Station democratize AI computing for developers and enterprises
  • Groot N1 and Newton physics engine advance robotics and physical AI
  • NVIDIA's roadmap includes Blackwell Ultra and Vera Rubin architecture with silicon photonics

The AI Revolution: From Perception to Reasoning

Jensen Huang began by reflecting on AI's evolution over the past decade. "AI has made extraordinary progress," he noted. "It has only been 10 years now we've been talking about AI for a little longer than that, but AI really came into the world's consciousness about a decade ago."

He outlined how AI has progressed through distinct phases: first perception AI (computer vision, speech recognition), then generative AI (text-to-image, text-to-video), and now agentic AI. This latest phase represents a fundamental shift in how AI operates.

"Agentic AI basically means that you have an AI that has agency," Huang explained. "It can perceive and understand the context of the circumstance, it can reason very importantly about how to answer or how to solve a problem, and it can plan and take action."

This evolution has led to the emergence of physical AI – AI that understands the physical world, including concepts like friction, inertia, cause and effect, and object permanence. This understanding is what will enable the next wave of robotics.

The Computational Challenge: 100x More Power Needed

One of the most striking revelations in Huang's keynote was how the industry had underestimated the computational requirements for advanced AI.

"This last year, this is where almost the entire world got it wrong," Huang stated. "The computation requirement, the scaling law of AI is more resilient and in fact hyper-accelerated. The amount of computation we need at this point as a result of agentic AI, as a result of reasoning, is easily a hundred times more than we thought we needed this time last year."

Huang explained this computational explosion by demonstrating how reasoning AI works. Instead of generating a single answer in one shot, reasoning AI breaks problems down step by step, generating thousands more tokens in the process. He showed a comparison between a traditional language model and a reasoning model tackling a wedding seating arrangement problem with complex constraints. While the traditional model generated 439 tokens and provided an incorrect answer, the reasoning model produced nearly 9,000 tokens and solved the problem correctly.

"Instead of just generating one token or one word after next, it generates a sequence of words that represents a step of reasoning," Huang explained. "The amount of tokens that's generated as a result is substantially higher... easily 100 times more."

Blackwell: The Architecture for AI Factories

To address these computational demands, NVIDIA unveiled its Blackwell architecture, which Huang described as the foundation for "AI factories" – data centers whose primary purpose is generating tokens that constitute AI outputs.

"The world is going through a transition in not just the amount of data centers that will be built, but also how it's built," Huang explained. "Everything in the data center will be accelerated."

The Blackwell architecture represents a fundamental shift in how computing systems are designed. Unlike previous generations where GPUs were integrated on a motherboard with MV-Link connections, Blackwell disaggregates the system. The MV-Link switches are now centralized in switch trays, while compute nodes are fully liquid-cooled, allowing NVIDIA to pack an exaflop of computing power into a single rack.

"This is the most extreme scale-up the world has ever done," Huang declared, showing how a single Blackwell rack contains 600,000 components, 5,000 cables spanning two miles, and delivers 570 terabytes per second of memory bandwidth.

Comparing performance metrics, Huang revealed that Blackwell offers 25 times better performance per watt than Hopper for reasoning AI workloads, and up to 40 times better performance when using the new NVIDIA Dynamo operating system.

NVIDIA Dynamo: The Operating System for AI Factories

One of the most significant announcements was NVIDIA Dynamo, which Huang described as "the operating system of an AI Factory."

"Whereas in the past, the way that we ran data centers, our operating system would be something like VMware... in the future the application is not enterprise IT, it's agents, and the operating system is not something like VMware, it's something like Dynamo," Huang explained.

Dynamo manages the complex task of distributing AI workloads across GPUs, handling different types of parallelism (tensor, pipeline, and expert), managing in-flight batching, and routing the KV cache to the right GPUs. It optimizes for both the "prefill" phase (where AI ingests and processes information) and the "decode" phase (where it generates tokens).

"This Dynamic operation is really complicated," Huang noted. "That piece of software is insanely complicated, and so today we're announcing the NVIDIA Dynamo."

Importantly, Dynamo is open source, with partners like Perplexity already working with NVIDIA on its implementation.

The NVIDIA Roadmap: Blackwell Ultra and Vera Rubin

Looking to the future, Huang laid out NVIDIA's product roadmap for the next several years, emphasizing the importance of long-term planning for AI infrastructure.

"This isn't like buying a laptop," Huang noted. "This isn't discretionary spend. This is spend that we have to plan on... a couple two, three years in advance, which is the reason why I show you our roadmap a couple two, three years in advance."

The roadmap includes:

  • Blackwell Ultra: Coming in the second half of 2024, with 1.5x more FLOPS, a new instruction for attention, 1.5x more memory, and 2x more networking bandwidth.
  • Vera Rubin (named after the astronomer who discovered dark matter): Coming in the second half of 2025, featuring a new CPU with twice the performance of Grace, new GPU (CX9), new networking SmartNIC, MV-Link 6, and HBM4 memory.
  • Rubin Ultra: Coming in the second half of 2027, with MV-Link 576 for extreme scale-up, 600kW per rack, 25 million parts, and 15 exaflops of computing power (compared to 1 exaflop for Blackwell).

Huang also announced breakthrough technology in silicon photonics, introducing the world's first 1.6 terabit per second co-packaged optics (CPO) based on micro-ring resonator modulator (MRM) technology. This innovation will dramatically reduce the power consumption and cost of connecting hundreds of thousands or even millions of GPUs, potentially saving tens of megawatts in large data centers.

AI for Enterprise: DGX Spark and DGX Station

Recognizing that AI needs to extend beyond cloud data centers, Huang introduced new products designed to bring AI capabilities to enterprises and individual developers.

The first was DGX Spark, a compact development platform designed for software engineers and data scientists. "There are 30 million software engineers in the world... this is clearly the gear of choice," Huang said, announcing that GTC attendees would have first access to reserve the new system.

For more demanding workloads, NVIDIA introduced the DGX Station, a personal workstation featuring Grace Blackwell liquid cooling, 72 CPU cores, and 20 petaflops of performance. "This is what a PC should look like," Huang declared. "This is the computer of the age of AI."

These systems will be available through OEM partners including HP, Dell, Lenovo, and ASUS, making enterprise-grade AI computing accessible to a broader audience.

NVIDIA is also revolutionizing the other pillars of computing – networking with Spectrum-X for enterprises, and storage with a new semantics-based retrieval system that continuously embeds information into knowledge in the background.

AI Partnerships: GM, Cisco, T-Mobile, and More

Huang announced several major partnerships that will expand NVIDIA's reach across industries:

  • GM: NVIDIA will partner with General Motors to build their future self-driving car fleet. "The time for autonomous vehicles has arrived," Huang stated, noting that the partnership will include AI for manufacturing, enterprise, and in-vehicle applications.
  • Edge AI: NVIDIA announced that Cisco, T-Mobile ("the largest telecommunications company in the world"), CUS, and ODC will build a full stack for Radio Networks in the United States, bringing AI to edge computing.
  • Enterprise AI: NVIDIA is working with numerous enterprise partners including Accenture, AT&T, BlackRock, Cadence, Capital One, Dell, EY, NASDAQ, SAP, and ServiceNow to integrate NVIDIA technology into their AI frameworks.

Huang also announced that NVIDIA's R1 reasoning model is now open source and enterprise-ready through NVIDIA's NIMS system, allowing companies to run it anywhere – on DGX Spark, DGX Station, OEM servers, or in the cloud.

Physical AI and Robotics: The Next Frontier

The final section of Huang's keynote focused on robotics and physical AI, which he suggested "could very well likely be the largest industry of all."

"The time has come for robots," Huang declared, noting that by the end of this decade, the world will face a shortage of at least 50 million workers. "We'd be more than delighted to pay them each $50,000 to come to work. We're probably going to have to pay robots $50,000 a year to come to work."

NVIDIA is addressing this opportunity with three key technologies:

  1. Omniverse: NVIDIA's operating system for physical AI, which allows for the creation of digital twins and simulation environments.
  2. Cosmos: A technology that generates infinite virtual environments for training robots, conditioned by Omniverse.
  3. Newton: A new physics engine developed in partnership with DeepMind and Disney Research, designed specifically for fine-grained rigid and soft body simulation, tactile feedback, and fine motor skills.

The keynote culminated with the announcement of Groot N1, a generalist foundation model for humanoid robots that is now open-sourced. Groot N1 features a dual system architecture for "thinking fast and slow," allowing robots to perceive, reason, plan, and execute precise actions.

To demonstrate these capabilities, a humanoid robot named Blue joined Huang on stage, showcasing the progress NVIDIA has made in robotics.

Conclusion: The Dawn of a New Computing Era

Jensen Huang's GTC 2024 keynote painted a comprehensive picture of how AI is transforming computing at every level. From the cloud to the enterprise to robots, NVIDIA is building the infrastructure that will power the next generation of intelligent systems.

The message was clear: we are at an inflection point in AI, where reasoning capabilities are driving unprecedented demand for computational power. NVIDIA's response – with Blackwell, Dynamo, and a clear roadmap to even more powerful architectures – positions the company at the forefront of this revolution.

As Huang summarized at the end of his presentation: "Blackwell is in full production... customer demand is incredible, and for good reason because there's an inflection point in AI. The amount of computation we have to do in AI is so much greater as a result of reasoning AI and the training of reasoning AI systems and agentic systems."

With annual roadmap updates, three distinct AI infrastructure tracks, and groundbreaking innovations in silicon photonics and physics simulation, NVIDIA is not just responding to the AI revolution – it's accelerating it. As the keynote's closing video suggested, we are entering an era where AI will help us explore new frontiers, from the depths of space to the intricacies of human biology, and NVIDIA is building the engines that will power this journey.

For the full conversation, watch the video here.

Subscribe to Discuss Digital

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe