Imagen-NVIDIA's Blackwell Architecture: Revolutionizing AI Factories and Computing's Next Era
Introduction
In a keynote address that has come to be known as the "Super Bowl of AI," NVIDIA CEO Jensen Huang took the stage at GTC (GPU Technology Conference) 2024 to outline a vision of computing that extends far beyond traditional paradigms. Without scripts or teleprompters, Huang delivered a comprehensive overview of NVIDIA's latest innovations and roadmap, emphasizing how AI is fundamentally transforming every aspect of computing.
The presentation centered on a profound shift in computing architecture - from retrieval-based to generative-based models - where computers no longer simply fetch stored data but instead generate contextual responses through understanding. This evolution has created what Huang calls "AI factories," facilities dedicated to generating tokens that form the building blocks of artificial intelligence across industries from healthcare to transportation, retail to manufacturing.
Key Points
- NVIDIA's Blackwell architecture delivers 40x performance improvement over Hopper for reasoning AI workloads
- Computing is shifting from retrieval-based to generative-based models, with AI factories generating tokens that are reconstituted into various forms of content
- NVIDIA introduced Dynamo, an open-source operating system for AI factories that optimizes GPU workloads
- New AI-focused computers were announced, including DGX Spark for developers and DGX Station with Grace Blackwell for enterprises
- NVIDIA revealed partnerships with GM for autonomous vehicles and introduced Newton, a physics engine created with DeepMind and Disney Research
- The company outlined a multi-year roadmap including Blackwell Ultra (2024), Vera Rubin (2025), and Rubin Ultra (2027)
- NVIDIA's Groot N1, a foundation model for humanoid robots, was announced as open-source to accelerate physical AI development
The Evolution of AI and Its Computational Demands
Huang began by tracing AI's evolution over the past decade, from perception AI (computer vision, speech recognition) to generative AI (text-to-image, image-to-text), and now to agentic AI - systems capable of perceiving context, reasoning through problems, and taking action. He highlighted physical AI as the next frontier, enabling robotics through understanding of the physical world.
"Agentic AI basically means that you have an AI that has agency," Huang explained. "It can perceive and understand the context of the circumstance, it can reason very importantly about how to answer or how to solve a problem, and it can plan and take action."
This progression has dramatically increased computational requirements, with reasoning AI needing approximately 100 times more computation than anticipated just a year ago. Huang illustrated this with a comparison between traditional language models and reasoning models:
"Traditional LLMs capture foundational knowledge while reasoning models help solve complex problems with thinking tokens," Huang demonstrated. In one example, a traditional model generated 439 tokens to answer a wedding seating arrangement problem incorrectly, while a reasoning model produced nearly 9,000 tokens to solve it correctly by thinking through multiple approaches.
The Architecture of AI Factories
To meet these extraordinary computational demands, NVIDIA has reimagined data center architecture. Huang introduced the Grace Blackwell MVLink-72 system, which represents a fundamental transition in computing architecture.
"This is the most extreme scale-up the world has ever done," Huang said, describing a system with 570 terabytes per second of memory bandwidth and one exaflop of computing power in a single rack. "Everything in this machine is now in teras [trillions]."
The system achieves this through disaggregation of the MVLink system, liquid cooling, and a new approach to scaling up before scaling out. Each rack contains approximately 600,000 components and two miles of cabling, yet delivers performance that would have required multiple racks with previous architectures.
Huang emphasized that inference - the process of generating tokens in response to prompts - is not the simple task many assumed but "the ultimate extreme computing problem" because it directly affects quality of service, revenues, and profitability.
Dynamo: The Operating System for AI Factories
To manage these complex AI workloads, NVIDIA announced Dynamo, an open-source operating system for AI factories. Dynamo optimizes the distribution of work across GPU clusters, balancing between "prefill" (thinking, digesting information) and "decode" (generating responses) phases of AI operation.
"Dynamo does all that. It is essentially the operating system of an AI Factory," Huang explained. "Whereas in the past... our operating system would be something like VMware... in the future the application is not Enterprise IT, it's agents, and the operating system is not something like VMware, it's something like Dynamo."
The name Dynamo was chosen deliberately, Huang noted, as a reference to the first instrument that started the last industrial revolution - converting water into electricity, just as AI factories convert computation into valuable tokens.
The NVIDIA Roadmap: Scaling Up Before Scaling Out
Huang laid out NVIDIA's ambitious roadmap for the next several years:
- Blackwell (current) - In full production, delivering 68x the scale-up FLOPS of Hopper
- Blackwell Ultra (second half of 2024) - 1.5x more FLOPS, new attention instruction, 1.5x more memory
- Vera Rubin (second half of 2025) - Named after the astronomer who discovered dark matter, featuring MVLink-144, twice the CPU performance of Grace, new GPU, new networking, and HBM4 memory
- Rubin Ultra (second half of 2027) - MVLink-576 with extreme scale-up, 15 exaflops (vs. 1 exaflop for Blackwell), and 4,600 terabytes per second of scale-up bandwidth
"Before you scale out, you have to scale up," Huang emphasized repeatedly, showing how each generation dramatically increases performance within a single system before connecting multiple systems together.
To enable scaling out to hundreds of thousands or eventually millions of GPUs, NVIDIA announced the world's first co-packaged silicon photonic system - a 1.6 terabit per second CPO using micro-ring resonator modulator (MRM) technology. This approach eliminates the need for traditional transceivers, which would consume 180 watts per GPU and cost $6,000 per GPU at scale.
"In a data center we could save tens of megawatts," Huang explained. "60 megawatts is 100 Rubin Ultra racks of power that we can now deploy into Rubin."
Bringing AI to Enterprise and Edge
Recognizing that AI will extend beyond cloud data centers, NVIDIA introduced new products for enterprise computing:
- DGX Spark - A development platform for software engineers and data scientists, offering 1 petaflop of computation in a small form factor
- DGX Station - A personal workstation featuring Grace Blackwell with liquid cooling, delivering 20 petaflops of performance
"This is the computer of the age of AI," Huang declared. "This is what computers should look like and this is what computers will run in the future."
NVIDIA also announced partnerships to bring AI to the edge, including a collaboration with Cisco, T-Mobile, Cus, and ODC to build a full stack for radio networks in the United States. Huang noted that $100 billion of the world's capital investments each year goes into radio networks, which will increasingly be infused with AI.
"In the future there is no question in my mind that's going to be accelerated computing infused with AI," he said. "AI will do a far better job adapting the radio signals... to the changing environments and the traffic conditions."
Autonomous Vehicles and Robotics
Huang announced that General Motors has selected NVIDIA to partner on building their future self-driving car fleet. "The time for autonomous vehicles has arrived," he stated, noting that the partnership will involve AI for manufacturing, enterprise operations, and in-vehicle systems.
He emphasized NVIDIA's commitment to safety, highlighting the company's Halos safety assessment process that has evaluated every line of their 7 million lines of code to ensure diversity, transparency, and explainability.
The keynote culminated with robotics announcements, including Newton - a physics engine developed in partnership with DeepMind and Disney Research. Newton enables real-time simulation of rigid and soft bodies, tactile feedback, and fine motor skills, all GPU-accelerated to train robots at super-real-time speeds.
NVIDIA also introduced Groot N1, a generalist foundation model for humanoid robots that features a dual system architecture for "thinking fast and slow." Huang announced that Groot N1 would be open-sourced to accelerate development in physical AI.
Conclusion: The Three Pillars of Computing's Future
Huang concluded by summarizing the three pillars of computing that NVIDIA is revolutionizing:
- Computing - From DGX Spark to Grace Blackwell systems
- Networking - With Spectrum X and silicon photonics
- Storage - Transitioning from retrieval-based to semantics-based systems
"The world is going to be at least 50 million workers short by the end of this decade," Huang noted, highlighting the economic imperative driving robotics development. "We'd be more than delighted to pay them each $50,000 to come to work. We're probably going to have to pay robots $50,000 a year to come to work."
Throughout the presentation, Huang returned to the theme of using AI to create better AI - a virtuous cycle that NVIDIA is uniquely positioned to enable through its full-stack approach to accelerated computing. From cloud data centers to personal workstations, from autonomous vehicles to humanoid robots, NVIDIA's vision encompasses a computing landscape fundamentally transformed by artificial intelligence.
As GTC 2024 - described by Huang as "the Super Bowl of AI where everybody wins" - demonstrated, NVIDIA continues to push the boundaries of what's possible in computing, setting the stage for the next era of AI innovation across industries.
For the full conversation, watch the video here.