Imagen-NVIDIA GTC 2025: Blackwell GPU Architecture Delivers 25x Performance Leap for AI

Imagen-NVIDIA GTC 2025: Blackwell GPU Architecture Delivers 25x Performance Leap for AI

Introduction

At GTC 2025, NVIDIA CEO Jensen Huang took the stage without a teleprompter to deliver a comprehensive vision of how AI is transforming computing and industries worldwide. Speaking to a packed audience at what has been dubbed "the Super Bowl of AI," Huang outlined NVIDIA's latest innovations in a rapidly evolving AI landscape that has seen multiple paradigm shifts in just the past decade.

The annual GPU Technology Conference (GTC) has grown from its roots in graphics processing to become the premier event for AI technology, attracting partners from healthcare, transportation, retail, and virtually every segment of the computing industry. This year's event highlighted NVIDIA's response to what Huang described as a fundamental inflection point in artificial intelligence—where computational demands have grown exponentially beyond previous projections.

Key Points

  • NVIDIA's Blackwell GPU architecture represents a 25x performance leap over Hopper for AI workloads at the same power consumption
  • AI is transitioning from generative to agentic AI (with reasoning capabilities) and physical AI (enabling robotics)
  • The computational demands of AI have increased 100x beyond previous projections due to reasoning models generating thousands more tokens
  • NVIDIA introduced Dynamo, an operating system for AI factories that optimizes workload distribution across GPU clusters
  • New enterprise offerings include DGX Spark and DGX Station, bringing AI capabilities to businesses of all sizes
  • NVIDIA announced its roadmap through 2027, including Vera Rubin architecture with 15 exaflops of computing power
  • Silicon photonics technology will enable scaling to millions of GPUs while reducing power consumption by tens of megawatts

The Evolution of AI: From Perception to Agentic Intelligence

Huang began by tracing AI's journey over the past decade, from perception AI (computer vision, speech recognition) to generative AI, which fundamentally changed computing from a retrieval model to a generative model. Now, we're witnessing the rise of agentic AI—systems that can perceive, reason, plan, and take action.

"Agentic AI basically means that you have an AI that has agency," Huang explained. "It can perceive and understand the context of the circumstance, it can reason very importantly about how to answer or how to solve a problem, and it can plan and take action."

This evolution has enabled AI to understand multimodal information, navigate websites, and use tools to accomplish tasks. Building on this foundation, Huang introduced the next wave: physical AI, which understands the three-dimensional world, including concepts like friction, inertia, cause and effect, and object permanence—capabilities that will enable the next generation of robotics.

Huang emphasized three fundamental challenges that each wave of AI must address:

  1. How to solve the data problem
  2. How to train models without human-in-the-loop limitations
  3. How to scale effectively—finding algorithms where more resources translate to smarter AI

The Computational Challenge: 100x More Than Expected

One of the keynote's most significant revelations was that the computational requirements for AI have grown far beyond previous projections. Huang explained that the industry had underestimated the computational demands of agentic AI by approximately 100 times.

"The computation requirement, the scaling law of AI, is more resilient and in fact hyper-accelerated," Huang said. "The amount of computation we need at this point, as a result of agentic AI, as a result of reasoning, is easily a hundred times more than we thought we needed this time last year."

Huang illustrated this with a compelling example comparing traditional language models to reasoning models. While a traditional model might generate 439 tokens to answer a question (often incorrectly), a reasoning model generates nearly 9,000 tokens as it works through multiple approaches, checks its work, and arrives at the correct solution.

This explosion in token generation, coupled with the need to maintain responsiveness, has dramatically increased computational demands. As Huang put it: "10 times more tokens, 10 times faster—the amount of computation we have to do is 100 times more, easily."

The Rise of AI Factories

A central theme of Huang's presentation was the concept of "AI factories"—data centers purpose-built for generating tokens, the fundamental building blocks of AI. These factories represent a shift from retrieval-based computing to generative computing.

"The world is going through a transition in not just the amount of data centers that will be built, but also how it's built," Huang explained. "Everything in the data center will be accelerated."

Huang shared data showing that capital expenditure for the world's data centers is expected to reach a trillion dollars sooner than previously anticipated, driven by two dynamics: the shift from general-purpose computing to accelerated computing, and the recognition that future software development requires capital investment in computational infrastructure.

"In the past, we wrote the software and we ran it on computers. In the future, the computer's going to generate the tokens for the software," Huang said. "The computer has become a generator of tokens, not a retrieval of files."

NVIDIA Blackwell: The Engine for AI Factories

The star of the show was NVIDIA's Blackwell architecture, which Huang revealed is now in full production. Blackwell represents a fundamental transition in computer architecture, focusing on "scaling up" before "scaling out."

Huang demonstrated how NVIDIA has reimagined the physical architecture, moving from the previous HGX system with eight GPUs connected via NVLink to a disaggregated system. The new approach separates the NVLink switches from the compute nodes and places them in the center of a liquid-cooled chassis.

"This is the big change of the entire industry," Huang said, holding up a Blackwell compute node. "From integrated NVLink to disaggregated NVLink, from air-cooled to liquid-cooled, from 60,000 components per computer to 600,000 components per rack."

The result is a one-exaflop computer in a single rack—an achievement Huang described as "the most extreme scale-up the world has ever done."

What makes this advancement particularly significant is its application to inference—the process of generating tokens in response to prompts. Huang explained that inference is "the ultimate extreme computing problem" because it directly affects quality of service, revenues, and profitability.

Huang illustrated the challenge with a graph showing the tension between tokens per second for individual users (response time) and tokens per second for the entire factory (throughput). The goal is to maximize both dimensions—creating smart AIs that respond quickly while serving as many users as possible.

NVIDIA Dynamo: The Operating System for AI Factories

To manage the complexity of optimizing AI workloads across these advanced systems, Huang announced NVIDIA Dynamo, which he described as "the operating system of an AI factory."

"Whereas in the past, in the way that we ran data centers, our operating system would be something like VMware... in the future, the application is not enterprise IT, it's agents, and the operating system is not something like VMware, it's something like Dynamo," Huang explained.

Dynamo, named after the device that sparked the industrial revolution of energy, manages the distribution of workloads across GPUs, handling complex tasks like pipeline parallelism, tensor parallelism, expert parallelism, in-flight batching, and disaggregated inferencing. Huang announced that Dynamo will be open source, with partners like Perplexity already working with NVIDIA on its development.

With Blackwell architecture and Dynamo working together, Huang revealed that NVIDIA has achieved a 40x performance improvement over Hopper for reasoning models. This translates to dramatic efficiency gains for AI factories—what would require 45,000 Hopper dies across 1,400 racks can now be accomplished with just 8,000 Blackwell dies in 250 racks.

NVIDIA's Roadmap: Vera Rubin and Beyond

In a rare move for the tech industry, Huang laid out NVIDIA's product roadmap for the next several years, explaining that AI infrastructure requires years of planning.

"This isn't like buying a laptop," Huang said. "This isn't discretionary spend. This is spend that we have to go plan on."

The roadmap includes:

  • Blackwell Ultra (second half of 2025): 1.5x more flops, 1.5x more memory, 2x more bandwidth
  • Vera Rubin (second half of 2026): Named after the astronomer who discovered dark matter, featuring a new CPU with twice the performance of Grace, new GPU, new networking SmartNIC, and NVLink 144
  • Rubin Ultra (second half of 2027): NVLink 576, 15 exaflops of computing power, 4,600 terabytes per second of scale-up bandwidth

Huang emphasized that each generation represents dramatic improvements in scale-up capabilities, with Rubin offering 900x the scale-up flops of Hopper.

Silicon Photonics: Enabling the Next Scale-Out Revolution

To complement NVIDIA's scale-up strategy, Huang introduced the company's breakthrough in silicon photonics, which will enable scaling to hundreds of thousands or even millions of GPUs.

Huang explained that while copper connections work well for short distances in scale-up scenarios, data centers the size of stadiums require long-distance connections. Traditional optical transceivers consume 30 watts each and cost around $1,000 in volume—meaning a system with 100,000 GPUs would require 600,000 transceivers consuming 180 megawatts of power just for signal conversion.

NVIDIA's solution is a co-packaged silicon photonic system using micro-ring resonator modulator (MRM) technology. Huang demonstrated the technology, which eliminates the need for transceivers by directly integrating fiber connections into switches.

"Because of the MRM choice, because of the incredible technology risks that over the last five years that we did and filed hundreds of patents... we're in a position to put silicon photonics with co-package options—no transceivers, direct fiber—into our switches with a radix of 512," Huang said.

This technology will enable NVIDIA to scale to multi-million GPU systems while saving tens of megawatts of power in data centers—power that can instead be used for computation.

Enterprise AI: DGX Spark and DGX Station

Recognizing that AI will extend beyond cloud data centers, Huang introduced new products designed to bring AI capabilities to enterprises of all sizes.

Huang unveiled the DGX Spark, a compact system with 20 CPU cores, 128GB of GPU memory, and one petaflop of computation for $30,000. He also introduced the DGX Station, a liquid-cooled personal workstation featuring Grace Blackwell architecture with 20 petaflops of performance and 72 CPU cores.

"This is what a PC should look like," Huang said, displaying the sleek, liquid-cooled system. "This is the computer of the age of AI. This is what computers should look like, and this is what computers will run in the future."

Huang announced that these systems will be available through OEM partners including HP, Dell, Lenovo, and ASUS. He also highlighted NVIDIA's work to revolutionize networking and storage for enterprise AI, with Cisco, NVIDIA, T-Mobile, Cus, and ODC building a full stack for radio networks in the United States.

Physical AI and Robotics: The Next Frontier

In the final segment of his presentation, Huang turned to what may become "the largest industry of all"—physical AI and robotics.

"The time has come for robots," Huang declared, noting that the world faces a severe shortage of human workers—projected to reach at least 50 million by the end of the decade.

Huang introduced three key technologies enabling NVIDIA's robotics efforts:

  1. Omniverse: NVIDIA's operating system for physical AI, which enables the creation of digital twins and simulation environments
  2. Cosmos: A system that uses Omniverse to generate an infinite number of controlled yet diverse environments for training robots
  3. Newton: A physics engine developed in partnership with DeepMind and Disney Research, designed for fine-grain rigid and soft body simulation, tactile feedback, and GPU acceleration

Huang also announced Groot N1, a generalist foundation model for humanoid robots that features a dual system architecture for "thinking fast and slow." In a surprise announcement, Huang revealed that Groot N1 will be open-sourced.

Conclusion

As Huang wrapped up his presentation, he summarized the key takeaways: Blackwell is in full production with customer demand exceeding expectations; Blackwell MVLink 72 with Dynamo delivers 40x the AI factory performance of Hopper; NVIDIA has established an annual roadmap rhythm to help customers plan their AI infrastructure; and the company is building AI infrastructure across three domains—cloud, enterprise, and robots.

The presentation concluded with a demonstration featuring a humanoid robot named Blue, highlighting how far robotics technology has advanced and pointing toward a future where AI and robots will transform every industry.

GTC 2025 made it clear that NVIDIA continues to lead the AI revolution, addressing the computational challenges of increasingly sophisticated AI systems while bringing these capabilities to enterprises and physical systems worldwide. As Huang put it, "Everything that moves will be autonomous."

With its comprehensive approach spanning hardware, software, and ecosystem development, NVIDIA is positioning itself not just as a chip company but as the architect of the AI era—building the factories that will generate the tokens that power our increasingly intelligent digital world.

For the full conversation, watch the video here.

Subscribe to Discuss Digital

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe