2024-06-05 NVIDIA Corporation (NVDA) BofA Securities 2024 Global Technology Conference (Transcript)

2024-06-05 NVIDIA Corporation (NVDA) BofA Securities 2024 Global Technology Conference (Transcript)

NVIDIA Corporation (NASDAQ:NVDA) BofA Securities 2024 Global Technology Conference June 5, 2024 3:30 PM ET
英伟达公司(纳斯达克:NVDA)美国银行证券 2024 年全球技术大会 2024 年 6 月 5 日 下午 3:30 ET

Company Participants 公司参与者

Ian Buck - VP
伊恩·巴克 - 副总裁

Conference Call Participants
电话会议参与者

Vivek Arya - Bank of America Securities
Vivek Arya - 美国银行证券

Vivek Arya

Hope everyone enjoyed their lunch. Welcome back to this session. I'm Vivek Arya. I lead the semiconductor research coverage at Bank of America Securities. I'm really delighted and privileged to have Ian Buck, Vice President of NVIDIA's HPC and Hyperscale business. Ian has a PhD from Stanford. And when many of us were enjoying our spring break, Ian and his team were working on Brook, which is the precursor to CUDA, which I think is kind of the beating heart of every GPU that NVIDIA sells. So really delighted to have Ian with us.
希望大家都享受了午餐。欢迎回到本次会议。我是 Vivek Arya。我负责美国银行证券的半导体研究。很高兴并感到荣幸能邀请到 NVIDIA 的 HPC 和超大规模业务副总裁 Ian Buck。Ian 在斯坦福大学获得博士学位。当我们许多人在享受春假时,Ian 和他的团队正在研究 Brook,这是 CUDA 的前身,我认为 CUDA 是 NVIDIA 销售的每个 GPU 的核心。非常高兴能与 Ian 一起。

What I thought I would do is lead off with some of my questions, but if there's anything that you feel is important to the discussion, please feel free to raise your hand. But a very warm welcome to you, Ian. Really delighted that you could be with us.
我想我会先提出一些我的问题,但如果有什么你觉得对讨论很重要的事情,请随时举手。但是,Ian,非常热烈地欢迎你。真的很高兴你能和我们在一起。

Ian Buck 伊恩·巴克

Thank you. Look forward to your questions.
谢谢。期待您的问题。

Question-and-Answer Session
问答环节

Q - Vivek Arya Q - 维韦克·阿里亚

Okay. So Ian, maybe let's -- to start it off, let's talk about Computex and some of the top announcements that NVIDIA made. What do you find the most interesting and exciting as you look at growth prospects over the next few years?
好的。那么 Ian,也许让我们开始吧,让我们谈谈 Computex 以及 NVIDIA 发布的一些重要公告。当你展望未来几年的增长前景时,你觉得最有趣和令人兴奋的是什么?

Ian Buck 伊恩·巴克

Yes. Computex is an important conference for NVIDIA and for AI now. The world's systems, data centers, they get their machines, their hardware from this small island of Taiwan, of course, the chips as well. So they're a very important ecosystem for us. A year ago, we introduced MGX, the system standard for deploying and building GPU systems in a variety of shapes and different sizes of workloads.
是的。Computex 对 NVIDIA 和 AI 现在都很重要。世界的系统、数据中心,它们从这个小岛台湾获取他们的机器、硬件,当然还有芯片。所以他们对我们来说是一个非常重要的生态系统。一年前,我们推出了 MGX,这是一种用于部署和构建各种形状和不同工作负载大小的 GPU 系统的系统标准。

And that opportunity are now the standard for building servers, start with a CPU motherboard. But now how many GPUs do you need? What's the configuration? What's the thermal profile and where they want to fit? And what workloads they want to run on, has diversified the whole system and server ecosystem.
现在,构建服务器的标准是从 CPU 主板开始。但现在需要多少个 GPU?配置是什么?热特性如何以及它们想要放在哪里?以及他们想要运行的工作负载,已经使整个系统和服务器生态系统多样化。

So it's been really fun to watch that explode and the number of companies that are able to take advantage of it. We talked, of course, about Blackwell and what it will do, our next-generation GPU. We talked as well about our roadmap, what we're doing today with the Hopper platform, it's our current architecture, what we'll be deploying with Blackwell and our Blackwell platform, including upgrades to Blackwell in 2025.
所以看到这种爆炸和能够利用它的公司数量增加真的很有趣。当然,我们谈到了Blackwell以及它将会做什么,我们的下一代 GPU。我们还谈到了我们的路线图,我们今天正在使用的 Hopper 平台,这是我们目前的架构,以及我们将在Blackwell和Blackwell平台上部署的内容,包括 2025 年对Blackwell的升级。

And then we also publicly talked about for the first time, what's after Blackwell, Rubin platform, which will come with a new CPU and GPU. So a lot of interesting, exciting from an infrastructure and hardware standpoint. On the software side, seeing the adoption of all sorts of different models and we can talk more about it.
然后我们还首次公开讨论了Blackwell之后的事情,即将推出的 Rubin 平台,将配备全新的 CPU 和 GPU。从基础设施和硬件的角度来看,这是非常有趣、令人兴奋的。在软件方面,看到了各种不同模式的采用,我们可以进一步讨论。

One way we're helping is by packaging up a lot of those models so it will be Llama, Mistral, Gemma into containers to help enterprises adopt them. And they know they're getting the best performance, the best inference capabilities in a nicely packaged container that they can then tailor and deploy anywhere.
我们正在帮助的一种方式是将许多这些模型打包起来,这样它们将成为 Llama、Mistral、Gemma,以帮助企业采用它们。他们知道他们正在获得最佳性能、最佳推理能力的精心打包容器,然后可以在任何地方进行定制和部署。

These are what we call NIMs, which are the NVIDIA inference containers that have those models. And we're educating and making it available to all the enterprises. So it's been a very exciting Computex, and if you ever get to go, it's quite available.
这些就是我们所说的 NIMs,这些是拥有那些模型的 NVIDIA 推理容器。我们正在教育并向所有企业提供这些容器。所以这是一个非常令人兴奋的 Computex,如果你有机会去,它是相当可用的。

Vivek Arya

So let's start looking at this from the end market, right? You work very closely with all the hyperscalers. From the outside, when we look at this market, we see the accelerator market was like over $40 billion last year, right? It could be like over $100 billion this year. But help us kind of bridge this to what the hyperscalers are doing, right? What are they doing with all the acceleration and all this hardware that they're putting in? Is it about making bigger and bigger models? Like where are they in that journey of their large language model outwards and how they're able to monetize them?
那么让我们从终端市场开始看,对吧?您与所有超大规模运算公司密切合作。从外部看,当我们看这个市场时,去年加速器市场规模超过了 400 亿美元,对吧?今年可能会超过 1000 亿美元。但是帮助我们将这与超大规模运算公司正在做的事情联系起来,对吧?他们如何利用所有的加速和硬件,他们正在做什么?这是关于制作越来越大的模型吗?他们在这个大型语言模型的旅程中处于什么位置,以及他们如何能够实现货币化?

Ian Buck 伊恩·巴克

Yes, we're still very much in the beginning of that AI growth cycle, really. It's odd to say, AI has been around for, at least accelerated AI, it's approaching 10 years now from the first AlexNet moment. But what we're seeing as the different hyperscalers are evolving and figuring out what their contributions and what their value is, there's three obvious thrusts. One, of course, is infrastructure, providing infrastructure at scale for the world, AI start-ups and community in the cloud to go consume.
是的,我们仍然处于人工智能增长周期的早期阶段。真的。说实话,人工智能已经存在了,至少是加速的人工智能,距离第一个 AlexNet 时刻已经接近 10 年了。但我们所看到的是,不同的超大规模计算机公司正在发展壮大,并找出他们的贡献和价值所在,有三个明显的推动力。当然,其中之一是基础设施,为全球的人工智能初创公司和云社区提供规模化的基础设施以供消费。

And you see all the major startups partnering or getting access to the technology, often not just one but multiple, or they switch and move around, figure out who can help them scale and grow their capabilities or who's bringing the GPU market first to where can they get their GPUs? That's infrastructure.
你会看到所有主要的初创公司都在合作或获取技术,通常不只是一个,而是多个,或者它们会转换并移动,找出谁可以帮助它们扩展和增强自己的能力,或者谁首先将 GPU 市场带到哪里他们可以获得他们的 GPU?这就是基础设施。

And of course, infrastructure is hugely profitable. Every dollar a cloud provider spends on buying a GPU, they're going to make it back at $5 over four years. The second thing we're seeing is growth in token serving, just building and providing AI inference, whether it be a Llama or a Mistral, or Gemma and providing it to the community's users to serve tokens. Here, the economics are even better. So every $1 spent, there's $7 earned over that same time period and growing.
当然,基础设施非常有利可图。云服务提供商每花费一美元购买 GPU,他们将在四年内以 5 美元的价格赚回来。我们看到的第二件事是令牌服务的增长,只是构建和提供 AI 推理,无论是 Llama 还是 Mistral,或者 Gemma 并将其提供给社区用户以提供令牌。在这里,经济效益甚至更好。因此,在同一时间段内,每花费 1 美元,就会赚取 7 美元,并且不断增长。

The third, of course, is building the next-generation models. Not everyone can do that at scale. And those models are getting very large and the infrastructure is getting huge. So we're seeing them build amazing next-generation capabilities and scale. And of course, that's not just a physical building and putting things in the building, but actually figuring all the software, the algorithms and training at that scale over that many billions and trillions of tokens and the software that has to go into that.
第三,当然,是构建下一代模型。并非每个人都能在规模上做到这一点。这些模型变得非常庞大,基础设施也变得庞大。因此,我们看到他们构建了令人惊叹的下一代能力和规模。当然,这不仅仅是一个物理建筑和将东西放入建筑中,而是实际上在那么多亿万和万亿的令牌上以那种规模进行所有软件、算法和培训,以及必须进入其中的软件。

I can talk all day about the software for training it going from 10 to 20 to 50 and now 100,000 GPUs. And in the next click out, people are going to be talking about 1 million. So that is -- all three of those are happening at the same time as they are figuring out they're developing the next-gen model, serving those models to the customers renting infrastructure. I guess the fourth would be using -- deploying AI for themselves, Copilot being an example. You can see multiple services on Amazon are now all being served -- being backed by AI agents or AI capabilities, some directly, some indirectly, you may not know.
我可以整天谈论关于培训软件的话题,从 10 到 20 再到 50,现在是 100,000 个 GPU。在下一个点击中,人们将开始谈论 100 万。所以这三者同时发生,他们正在研发下一代模型,为客户提供这些模型,租用基础设施。我想第四个是使用 AI 为自己服务,Copilot 就是一个例子。你可以看到亚马逊上的多个服务现在都由 AI 代理或 AI 能力支持,有些是直接的,有些是间接的,你可能不知道。

And of course, companies like Meta offering services or their news feed or whether it be recommenders or elsewhere deploying AI into just -- which raises all the numbers across the board quite -- they've been a great partner for NVIDIA of new model.
当然,像 Meta 这样的公司提供服务或他们的新闻订阅,或者是推荐系统或其他地方部署 AI 到各个领域,这大大提高了各方面的数字——他们一直是 NVIDIA 新模型的重要合作伙伴。

Vivek Arya

So you mentioned that AI, the traditional AI or CNNs, right, they have been around for a long time. We used to talk about like tens of millions of parameters, and here, we are knocking on the door of what, almost two trillion parameters. Do you see a peak in terms of when we kind of say, okay, this is it, the model sizes? Now we might even go backwards, that we might try to optimize the size of these models, right, have smaller or midsized models. Or we are not yet at that point?
所以你提到了 AI,传统的 AI 或 CNN,对吧,它们已经存在很长时间了。我们过去谈论的是数千万个参数,而现在,我们已经接近了将近两万亿个参数。你是否认为在我们说,好吧,这就是了,模型的大小达到顶峰的时候?现在我们甚至可能会倒退,尝试优化这些模型的大小,使用更小或中等大小的模型。或者我们还没有达到那个点?

Ian Buck 伊恩·巴克

Yes. So the evolution of AI models is quite interesting and they map to the workloads. Obviously, initially, it started with ImageNet or image recognition. What is this a picture of? It doesn't tell really you where it is, just what a picture of, then we can put boxes around what it is.
是的。因此,AI 模型的演变非常有趣,并且它们与工作负载相匹配。显然,最初是从 ImageNet 或图像识别开始的。这是一幅什么图片?它并没有告诉你它在哪里,只是告诉你这是一幅什么图片,然后我们可以在它周围放置框。

And then we can identify every pixel, and they got more and more intelligent. When we got to language and LLMs, that was another quick upping of intelligence because language is different than just image.
然后我们可以识别每个像素,它们变得越来越智能。当我们涉及到语言和LLMs时,这是智能的另一个快速提升,因为语言不同于仅仅是图像。

CNNs was about understanding what's inside of the picture. You and I do that, but also dogs and cats and even bugs have to recognize from vision what they are. Language is uniquely a step above intelligence. You have to understand what the person is saying, what they mean, the context, which goes right to the understanding of overall human understanding and knowledge. Take it a click further, you got to do generative AI.
CNNs 是关于理解图片内部的内容。你和我都能做到,但狗、猫甚至虫子也必须从视觉中识别出它们是什么。语言在智能上是独一无二的一步。你必须理解对方在说什么,他们的意思,上下文,这直接涉及到对整体人类理解和知识的理解。再进一步,你必须进行生成式人工智能。

Not only do you need to understand what was said but be able to maybe summarize it, but actually synthesize to create new things, whether it be a chatbot, open chatbot conversation like you do in WhatsApp with Meta AI or coding, generating code that works correctly, that wants to be a certain style or being able to generate a picture from test and do multimodal.
不仅需要理解所说的内容,还需要能够可能对其进行总结,实际上进行综合以创造新事物,无论是聊天机器人、像您在 WhatsApp 中与 Meta AI 进行的开放聊天机器人对话,还是编码、生成能够正确运行的代码,希望具有特定风格或能够从测试中生成图片并进行多模态处理。

So it's a little cheesy to say but it's understanding the, what do we need? What are we saying? What is the context? Can we get -- can the AI reproduce that and generate that? I talked to the AI scientists.
所以说这有点俗套,但理解的是,我们需要什么?我们在说什么?背景是什么?我们能否得到 -- AI 能够复制并生成吗?我和 AI 科学家交谈过。

They do the studies and they don't see their models being overtrained yet. They can continue to take more and more tokens. The tokens, of course, is part of the limiter. You do have to have a massive data set in order to train a foundational model from scratch. Once you do that, though, and you build 100 billion, 400 billion, 1.8 trillion, 2 trillion, that model becomes the foundation for a whole litany of other models.
他们进行研究,但他们并没有看到他们的模型过度训练。他们可以继续获取更多的令牌。当然,令牌是限制因素的一部分。你必须拥有庞大的数据集才能从头开始训练一个基础模型。一旦你做到了,然后你构建了 1000 亿、4000 亿、1.8 万亿、2 万亿,那个模型就成为了一系列其他模型的基础。

You can take a Llama 70B and produce a set of an 8B underneath it, depending on how much level of accuracy or comprehension you want to provide it or your context length. You can then take that foundation model and fine-tune it and optimize it to generate code Llama so you can like basically have a coding Copilot. That all starts from a foundation model. Each one of these are not individual efforts. They take a foundation and they go deploy it everywhere.
您可以使用 Llama 70B 来生成一个 8B 的集合,具体取决于您想要提供多少准确性或理解力,或者您的上下文长度。然后,您可以拿这个基础模型进行微调和优化,以生成代码 Llama,这样您基本上就拥有了一个编码副驾驶。所有这些都始于一个基础模型。这些都不是个体努力。它们都是基于一个基础并在各处部署。

Like Microsoft would do with GPT and Copilot, turning one giant foundation model into 100 different assets that activate a whole bunch of other products. That's the value of foundational models when they get built.
就像微软会用 GPT 和 Copilot 一样,将一个巨大的基础模型转化为 100 种不同的资产,激活一大堆其他产品。这就是基础模型在构建时的价值。
基础的智慧可能是极简单的,而不是现在这个模样。

They built a large, capable one and they fine-tune, build smaller ones that can do certain tasks or others and create that opportunity. In terms of where is it going next, they haven't seen the limit in terms of learning, the things we were still learning. Probably logical as a human brain is 100 trillion to 150 trillion depending on your worth of neurons, connections in your head. We're at about two trillion now in AI.
他们建造了一个大而强大的机器,他们对其进行微调,建造了可以执行特定任务或其他任务的较小机器,并创造了这种机会。就下一步的发展方向而言,他们还没有看到学习的极限,我们仍在学习的事物。可能是合乎逻辑的,因为人类大脑中有 100 万亿到 150 万亿个神经元,连接在你的头脑中。目前,我们在人工智能领域大约有 2 万亿个。

Vivek Arya

So 50x more? 所以是 50 倍?

Ian Buck 伊恩·巴克

At least. We haven't gone to reasoning yet. That would be the next step. How do you reason about actually doing reasoning or come up with conclusions and a logic chain? That's thinking.
至少。我们还没有进行推理。那将是下一步。你如何推理实际进行推理或得出结论和逻辑链?那就是思考。

Vivek Arya

But is there matching returns, do you think, at some point? Or just the cost of training, can that get to a level where it kind of puts an upper limit on how large these models can be?
但是你认为在某个时候会有匹配的回报吗?或者只是培训的成本,能否达到一个水平,从而对这些模型的规模设置一个上限?

Ian Buck 伊恩·巴克

The cost training is actually .is definitely a factor. Getting the infrastructure is a factor in how fast we can move the needle here. In addition to the science, in addition to the software, to the algorithms, to the complexity, to the resiliency, doing things at this scale requires an end-to-end optimization. It's not just about the hardware.
培训成本实际上是一个因素。获得基础设施是我们能够在这里推动进展的一个因素。除了科学、软件、算法、复杂性、韧性之外,以这种规模做事需要端到端的优化。这不仅仅是硬件问题。

You try to get a -- maybe a simple analogy as to build a bigger -- to turn your company into more revenue. You just don't wagon in 10,000 or 50,000 employees. You have to build a company in order to grow and be more intelligent. So you just can't wagon in 50,000 or 100,000 or 1 million more GPUs. You do have to work and you have to build the capability to be able to keep those GPUs all working together or that company to build -- to work together to build things even bigger.
你试着找一个——也许是一个简单的类比,来构建一个更大的——让你的公司创造更多收入。你不能只是招聘 1 万或 5 万名员工。你必须建立一个公司,以便增长并变得更聪明。所以你不能只是招聘 5 万、10 万或 100 万个 GPU。你必须努力工作,建立能力,使这些 GPU 能够一起工作,或者使公司能够一起工作,以构建更大的事物。
担心最终有用的标记并不需要十分强大的算力,很小的算力就能做到很准确的标记。

That is the day-to-day life that I tend to lead, to working with those biggest customers, to figure out not what scale they think they can achieve from an infrastructure standpoint but also the software and algorithms. Is there a limit? We haven't hit one yet. Certainly, the 100,000 is happening now and the 1 million is being talked about for -- and we're seeing -- we're walking up that curve right now.
这是我倾向于过的日常生活,与那些最大的客户合作,不仅要弄清楚他们认为从基础设施的角度可以实现的规模,还要了解软件和算法。有限制吗?我们还没有遇到过。当然,目前正在发生的是 10 万,而 100 万正在讨论中——我们正在看到——我们现在正在逐渐走上这条曲线。

Vivek Arya

All right. Do you find it interesting that some of the most frequently used and the largest models, one is developed by a start-up and one is developed by somebody who's not a hyperscaler, right? So where do you think kind of the biggest hyperscalers are in their journey? Are they still in early stages? Are they hoping to just kind of leverage the technology that's been built up? Or do you think they have to get things going also and that can provide growth over the next several years?
好的。您是否觉得有趣的是,一些最常用和最大的模型中,一个是由初创公司开发的,另一个是由一个不是超大规模运算者的人开发的,对吧?那么您认为最大的超大规模运算者在他们的发展过程中处于什么阶段?他们还处于早期阶段吗?他们希望只是利用已经建立起来的技术?还是您认为他们也必须开始行动,并在未来几年内提供增长?

Ian Buck 伊恩·巴克

Yes. I think the Lighthouse models, everyone recognizes the benefit of having a foundational model as an asset. It's something they can leverage for their business. Some of them make it public, some of them don't. That's a business decision, a strategic decision. And the -- but the innovation is still happening. That's the interesting thing.
是的。我认为灯塔模型,每个人都认识到拥有基础模型作为资产的好处。这是他们可以为他们的业务利用的东西。他们中的一些人将其公开,而另一些人则不会。这是一个商业决策,一个战略决策。但创新仍在发生。这是有趣的事情。

So that could be there's so much change happening in AI design and model design and how to train these things at scale. Students at Berkeley are becoming -- or professors at Stanford turn into startups, turn into -- discover a new kind of attention mechanism, some modification of transformer, or they do something totally different than transformers and doing state-based algorithms. We are not -- the AI architecture, the model architecture is constantly evolving.
因此,可能是在人工智能设计和模型设计以及如何在规模上训练这些东西方面发生了如此多的变化。伯克利的学生正在变得——或者斯坦福的教授变成了创业公司,变成了——发现了一种新的注意机制,对变压器进行了一些修改,或者他们做了与变压器完全不同的事情,并且正在进行基于状态的算法。我们不是——人工智能架构,模型架构不断发展。

Just this year, we -- last year, we started seeing an explosion of the mixture of expert style model, which changed the model architectures to allow it to scale to 1 trillion. Mixture of experts is when previously like GPT-3, it would have a transformer-based AI neural network, followed by another one followed by another one and many layers of one AI after another to build 173 billion parameters.
就在今年,我们——去年,我们开始看到专家风格模型的爆炸式增长,这改变了模型架构,使其能够扩展到 1 万亿。专家混合是指以前像 GPT-3 那样,它会有一个基于变压器的 AI 神经网络,然后是另一个,再然后是另一个,以及许多层一个接一个的 AI,以构建 1730 亿个参数。

If you look at models like GPT-4 or others, they're a mixture of experts. They're on the order of 1 trillion parameters. And one of the ways they achieve that actually is there's not one model stacked on top of each other, one neural network stacked on top of each other. They actually have multiple neural networks running across each layer. In fact, if you look at the 1.8 trillion-parameter GPT model, it has 16 different neural networks all trying to answer their part of the layer.
如果你看看像 GPT-4 或其他模型,它们是专家的混合。它们的参数数量大约为 1 万亿。它们实际上实现这一点的一种方式是,不是一个模型堆叠在另一个模型之上,也不是一个神经网络堆叠在另一个神经网络之上。实际上,它们在每一层都有多个神经网络在运行。事实上,如果你看看 1.8 万亿参数的 GPT 模型,它有 16 个不同的神经网络,每个都在尝试回答他们所在层的部分。

And then they confer and meet up and decide what the right answer is. And then they share with the next 16, like this room is going talk -- you guys confer, next group you guys confer, and hand it off the row. And that mixture of experts allows each neural network to have its own specialty, own little perspective to make the whole thing smarter. What's interesting about that is that not only the models get bigger, they're smarter, it actually changed the way we do computing because now -- we used to have one neural network, one big matrix, multiply mass followed by another, followed by another. Now we have lots of them, and they're communicating all the time.
然后他们进行磋商、会面并决定正确答案。然后他们与接下来的 16 人分享,就像这个房间会讨论——你们磋商,下一组你们磋商,然后传递给下一排。这种专家混合使每个神经网络都有自己的专长、独特的视角,使整个系统更智能。有趣的是,不仅模型变得更大,它们也更智能,实际上改变了我们进行计算的方式,因为现在——我们过去只有一个神经网络,一个大矩阵,乘以另一个,再乘以另一个。现在我们有很多,它们一直在进行通信。

Each one of you have to talk to everybody else and confer and then share your knowledge with the next row. So you see that in the systems and designs and how the architecture is evolving. So one of the reasons in the Blackwell architecture, we did this multi-node NVLink or NVL72. We expanded our how many GPUs you can connect in one -- with NVLink. Up to 72 still allow for that mixture of experts.
每个人都必须与其他人交谈并协商,然后与下一排分享您的知识。因此,您可以看到系统和设计以及架构如何发展。因此,在Blackwell架构中,我们采用了多节点 NVLink 或 NVL72 的原因之一。我们扩展了可以连接多少个 GPU 的数量 - 通过 NVLink。最多可连接 72 个,仍然允许专家混合。

So these -- everyone could be communicating with each other and not get blocked on IO. So this evolution is constantly happening, the model architectures. So you can see start-ups figuring this out. They take advantage of it. They partner with a hyperscaler or partner with the cloud, with help from NVIDIA get -- move the needle in a next phase of what AI looks like from a model, for what it can do and how it could implement in the architecture.
因此,每个人都可以彼此交流,而不会在 IO 上被阻塞。因此,这种演变不断发生,模型架构。因此,您可以看到初创公司正在解决这个问题。他们利用它。他们与超大规模合作伙伴合作,或者与云合作,借助 NVIDIA 的帮助,推动 AI 在模型的下一个阶段看起来如何,以及它可以做什么以及如何在架构中实施。

So when I say early stages, that is kind of what it feels like. These last two years has been explosion of mixture of experts. We have new model architectures that are starting to show up. It's influencing how we deploy them, the software we write, the algorithms, all that. And then on top of it, that's going right into NVIDIA's roadmap, what we're building, how fast we can build. It's one of the reasons we're accelerating our roadmap. That's because the world is constantly in AI evolving and changing and upgrading.
所以当我说早期阶段时,这就是它的感觉。过去两年已经是专家混合爆炸。我们有新的模型架构开始出现。这影响了我们部署它们的方式,我们编写的软件,算法等等。而且,这正好融入了英伟达的路线图,我们正在构建的东西,我们可以构建的速度。这是我们加速路线图的原因之一。这是因为世界在不断发展、变化和升级的人工智能。

Vivek Arya

Got it. Now I'm glad you brought that up in terms of the 1-year product cadence because one aspect of this is we are seeing these model sizes. I've seen one statistic that says they are doubling every six months or so. So that argues that even a 1-year product cadence is actually not fast enough. But then the other practical side of it is that your customers then have to live in this constant, right, flux in their data center. So how do you look at the puts and takes of this 1-year product cadence?
明白了。现在我很高兴你提到了一年产品节奏方面的问题,因为其中一个方面是我们看到这些型号尺寸。我看到有一个统计数据说它们大约每六个月翻一番。这就说明即使一年的产品节奏实际上还不够快。但另一方面的实际问题是,你的客户必须在他们的数据中心中持续处于这种不断变化之中。那么你如何看待这种一年产品节奏的利弊呢?

Ian Buck 伊恩·巴克

So the overall performance improvement comes as a compounding of hardware connectivity, algorithms, and model architecture. When we do optimizations, we look at it holistically. Obviously, we still are improving the performance of our Ampere generation GPUs. We've been -- we've improved the performance of our Hopper GPUs by 3x. Actually, when we first introduced Hopper in end of '22, and running Llama on doing -- Llama was GPT inference.
因此,整体性能的提升是硬件连接、算法和模型架构的综合效果。在进行优化时,我们会全面考虑。显然,我们仍在提升安培一代 GPU 的性能。我们已经将霍珀 GPU 的性能提升了 3 倍。实际上,在'22 年底首次推出霍珀时,我们正在运行 Llama 进行 GPT 推理。

From the end of '22 to today, I think we've improved Hopper's inference performance by 3x. So we're continuously making the infrastructure more efficient, faster and more usable. And that gives the customers who have to now buy at a faster clip, confidence that the infrastructure that they've invested in is going to continue to return on value and does so. The workloads might change. They may -- they'll take their initial Hopper and build the next GPT. They may take the next Blackwell and build the next GPT.
从'22 年底到今天,我认为我们已经将 Hopper 的推理性能提高了 3 倍。因此,我们正在不断使基础设施更加高效、更快速、更易用。这让那些现在必须以更快的速度购买的客户相信,他们投资的基础设施将继续产生价值回报。工作负载可能会发生变化。他们可能会拿出他们的初始 Hopper 并构建下一个 GPT。他们可能会拿出下一个Blackwell并构建下一个 GPT。

But that may be the infrastructure they use to continue to refine or create the derivative models or host and serve it. I think one of the interesting things, our products used to be much more segmented to inference and training. You use a 100-class GPU, the big iron for training. The smaller PCIe products for inference, due to the cost or size of model. Today, the models and the infrastructure that could be used for training at scale is also frequently used for inference, which I know is difficult for this community to digest and figure out what's inference and training.
但这可能是他们用来继续完善或创建衍生模型或托管和提供服务的基础设施。我认为有趣的一点是,我们的产品过去更多地被分割为推理和训练。您使用 100 类 GPU 进行训练的大铁。由于成本或模型大小,较小的 PCIe 产品用于推理。如今,用于大规模训练的模型和基础设施也经常用于推理,我知道这对这个社区来说很难消化和弄清楚什么是推理和训练。

I'm sorry about that. But that is the benefit, their capacity. They can invest and they know they can use those GPUs for both inference and training and get continued value and performance throughout. So with the increased pace, it's sort of natural. This market can certainly support the continued improvement. The feedback cycle working with NVIDIA allows us to invest and build new technologies and respond and enable.
对此我感到抱歉。但这就是好处,它们的容量。它们可以投资,也知道可以同时用这些 GPU 进行推理和训练,并持续获得价值和性能。因此,随着增加的速度,这是很自然的。这个市场肯定可以支持持续改进。与英伟达合作的反馈循环使我们能够投资并开发新技术,并做出响应和启用。

And then it becomes a job of managing transitions and execution and supply and data center to make sure that we have -- everyone has the GPUs they need. I certainly talk to startups. Some of them are on A100 still and they're enjoying it. They're looking forward to their H100s. Others that have their H100s, they're looking forward to their Blackwells.
然后,这就变成了管理过渡、执行、供应和数据中心的工作,以确保我们拥有——每个人都拥有他们所需的 GPU。我肯定会与初创公司交谈。其中一些仍在使用 A100,并且他们很喜欢。他们期待着他们的 H100。其他已经拥有 H100 的人,他们期待着他们的 Blackwells。

And they're all getting the benefit of the performance and algorithms in the platform that we provide. So the demand -- it's one way to meet the demand is to continue to support and drive the whole ecosystem. And that just creates more players, more invention and moves the ball forward, which is a rising tide for us and, I think, the whole market.
他们都在享受我们提供的平台中的性能和算法的好处。因此,满足需求的一种方式是继续支持和推动整个生态系统。这只会创造更多的参与者、更多的发明,并推动事情向前发展,这对我们来说是一种上升的潮流,我认为对整个市场也是如此。

Vivek Arya

There's always this question about what are the killer apps driving generative AI, right? Yes, we understand that a lot of hardware is being deployed. So what are the top use cases, right? You mentioned that customers deploying NVIDIA hardware are seeing that 4x to 5x the return on their investment. But what are the use cases that are actually being -- obviously, that's over a 4-year period, right? So what are the big use cases that you think are the most promising right now?
总是有这样一个问题,即是什么是推动生成式人工智能的杀手级应用程序,对吧?是的,我们了解到很多硬件正在部署。那么,最主要的用例是什么呢?您提到部署 NVIDIA 硬件的客户看到了他们投资回报率增加了 4 到 5 倍。但实际上正在被使用的用例是什么——显然,这是在 4 年的时间段内,对吧?那么,您认为目前最有前途的主要用例是什么?

Ian Buck 伊恩·巴克

The baseline -- obviously build that foundation model, it gets shown off and enjoyed as a chat bot that you all can interact with. But then they go and they turn those into -- incorporate them into their products. Copilot is a good example. Taking a GPT model and tailoring it so that you're -- it can help you create that PowerPoint, write that e-mail, type and modify or create that Excel expression. That's really hard to figure out.
基线——显然要建立那个基础模型,它被展示并作为一个聊天机器人供大家互动。但然后他们会将这些转化为——将它们整合到他们的产品中。Copilot 是一个很好的例子。拿一个 GPT 模型并调整它,这样你就可以帮助你创建那个 PowerPoint,写那封电子邮件,输入和修改或创建那个 Excel 表达式。这真的很难弄清楚。

I've certainly used that. Certainly, developers. I see Microsoft, I think, has spoken publicly about how much productivity their software developers have increased because they've made that model available internally to help via Copilot for their own software products. So AI has accelerated their entire product portfolio, like everything. And I don't know how you model or measure that benefit, but they're -- in terms of headcount or software developers but also a rate at which they can roll out new technology, new products.
我肯定使用过那个。当然,开发人员。我看到微软,我认为,已经公开谈论了他们的软件开发人员因为他们已经内部提供了那个模型来帮助他们自己的软件产品而增加了多少生产力。因此,人工智能已经加速了他们的整个产品组合,就像一切。我不知道你如何建模或衡量这种好处,但他们--无论是人数还是软件开发人员,还是他们可以推出新技术、新产品的速度。

In some ways, generative AI is making all the old boring products exciting again and reassessing their value, their ASP and the revenue they can make on their existing installed base. That's just Microsoft. It's happening across all of those industries and why every company wants to be deploying and use benefit of AI or generative AI is because they see the opportunity to improve the productivity of their own existing products and installed base and users, and, of course, provide the additional value the generative AI content or client or agent can make as a feature add, not just make the existing features better.
在某种程度上,生成式人工智能正在让所有旧的乏味产品再次变得令人兴奋,并重新评估它们的价值、平均销售价格和它们在现有安装基础上可以创造的收入。这不仅仅发生在微软身上。这种情况正在所有这些行业中发生,每家公司都希望部署和利用人工智能或生成式人工智能的好处,因为他们看到了提高自己现有产品和安装基础以及用户生产力的机会,当然,还能提供生成式人工智能内容、客户或代理所能带来的额外价值作为功能增加,而不仅仅是让现有功能变得更好。
提升存量业务的服务质量。

That's what we see in the enterprise. Certainly, other areas is, in generative, AI just the content creation, the new companies, the new start-ups that are providing those key technologies, key enablers, which are going to be either consumed or purchased by more of the established software ecosystem.
这就是我们在企业中看到的。当然,在其他领域,AI 通常只是内容创作,新公司,新创业公司提供这些关键技术,关键推动因素,这些技术和推动因素将被更多已建立的软件生态系统所消费或购买。

We are certainly seeing AI now work its way into finance, into health care, into telco markets, big adopters. Obviously, these are companies that have -- see high benefits have a lot of data and are often technology-savvy, ready to adopt. But every industry will. The other area I think we're seeing is -- well, AI in general is recommender systems.
我们当然看到人工智能现在正在进入金融、医疗保健、电信市场等大型采用者。显然,这些公司拥有很多数据,通常具备技术娴熟,愿意采用。但每个行业都会。我认为我们正在看到的另一个领域是——总的来说,人工智能是推荐系统。

It's not as talked about or it's sexy but it's certainly a big part of inference is deploying AI to understand the content, present the right content, the right user or the -- make sure the wrong content is not shown to the wrong user, and also see the opportunities to make their platforms easier and the click-throughs higher and the revenues as a result faster. Those recommender systems are leveraging all the learning -- the generative AI work that's been done specifically on the content to increase revenues.
它可能没有那么引人注目,但推断的一个重要部分是部署人工智能来理解内容,呈现正确的内容给正确的用户,确保错误的内容不会显示给错误的用户,并且看到机会使他们的平台更易于使用,点击率更高,从而使收入增加更快。这些推荐系统正在利用所有学习 - 特别是在内容上进行的生成人工智能工作,以增加收入。
标记每个人的行为,并对每个人的行为做出判断,这跟天气预报是一回事。

Vivek Arya

Got it. I wanted to talk about AI inference and get your views on what is NVIDIA's moat in AI inference? Because if I say that inference is a workload where I'm really constraining the parameters, where I'm optimizing sometimes more of a cost than performance, why isn't a custom ASIC the best product for AI inference, right? I know exactly what I need to infer, right? And I can customize it and I don't need to go after -- I don't need to make the same chip work for training also. So why isn't a custom ASIC the best product for AI inference?
明白了。我想谈谈 AI 推理,并听听您对 NVIDIA 在 AI 推理方面的优势的看法?因为如果我说推理是一个我真正约束参数的工作负载,有时我更多地优化成本而不是性能,为什么定制 ASIC 不是 AI 推理的最佳产品呢?我知道我需要推断什么,对吧?我可以定制它,而且我不需要追求 - 我不需要让同一芯片同时用于训练。那么为什么定制 ASIC 不是 AI 推理的最佳产品呢?

Ian Buck 伊恩·巴克

Yes, it's a good question and one that gets asked a lot. First off, you're often your best architecture for inference is the one you trained on. If you know how training works, training works by starting with a blank neural network or maybe one that's been pretrained, a general foundation model, but you're going to train it to be a better call center agent or codeveloper for a software program.
是的,这是一个很好的问题,也是经常被问到的问题。首先,通常你最好的推理架构是你训练过的那个。如果你知道训练是如何进行的,训练是通过从一个空白的神经网络开始,或者可能是一个已经预训练过的,一个通用的基础模型,但你将训练它成为一个更好的呼叫中心代理或软件程序的共同开发者。

So you're starting with that, you're training. Training starts with inference, you actually send the tokens through and you ask the AI to predict what it should do, and you tell if it's right or wrong, and then you send the errors, the different -- why I got the question wrong or why I got it right and reinforce those neurons.
所以你从那开始,你在训练。训练始于推理,你实际上将令牌发送过去,然后要求 AI 预测它应该做什么,你告诉它是对还是错,然后发送错误,不同的--为什么我回答错误或者为什么我回答正确,并加强那些神经元。

But it always starts with that forward pass and it's a big part of training. So that builds a natural transition from training to inference. The second thing is that the models are always evolving and changing over time. Think about it. You're going to invest $1 billion, $5 billion, $10 billion in a data center infrastructure for inference.
但它总是从那个前向传递开始,这是训练的重要部分。因此,这构建了从训练到推理的自然过渡。第二件事是模型始终在不断发展和变化。想想看。你将在数据中心基础设施上投资 10 亿美元,50 亿美元,100 亿美元用于推理。

That asset is going to last you four or five. I think they're still just now retiring some of those older Keplers and Voltas in the data center. The more that asset can do all of those models, the ones that are being important today but the ones that are going to be important, that show up tomorrow and after that and after that, you know that you can make that investment and have that capability, that platform infrastructure that's going to continue to produce the revenues that we talked about. And so was -- and hardware takes a long time to build, don't forget that. We're excelling on our roadmap but that's only because we can have multiple hardware and architectures working in flight in parallel as well as trying to compress it.
那个资产将为您持续四到五年。我认为他们现在仍在数据中心退役一些旧的凯伯和伏特。资产能够执行所有这些模型,那些今天重要的模型,以及未来重要的模型,明天以及之后,您知道您可以进行投资并拥有那种能力,那种平台基础设施将继续产生我们谈论的收入。硬件需要很长时间来构建,不要忘记这一点。我们在路线图上表现出色,但这仅仅是因为我们可以同时运行多个硬件和架构,同时试图压缩它。

But it's hard to compress. And the execution there is very difficult and tapeouts to productions are very long, and they're longer than the innovation cycle of AI. So that's why programmability is important.
但是很难压缩。执行起来非常困难,从设计到生产的时间非常长,比人工智能的创新周期还要长。这就是为什么可编程性很重要。

That's why having an architecture that is or platform that everybody is using, not just at your company, but every other company and across the academic ecosystem, the start-up ecosystem, you know that it will -- as the model evolves, as the techniques and technologies evolve, that, that investment is going to continue to track with forward innovation, not just what we have now. Now of course, if you know you have one model, you know you're going to put it in one device, you know where it's going to go, and that may be the right answer.
这就是为什么拥有一个所有人都在使用的架构或平台,不仅仅是在您的公司,而且在每家公司以及整个学术生态系统、初创企业生态系统中,您知道随着模型的演变、技术和技术的演变,这种投资将继续跟踪前瞻创新,而不仅仅是我们现在拥有的。当然,如果您知道您有一个模型,您知道您将把它放在一个设备中,您知道它将去哪里,那可能是正确的答案。

And NVIDIA is not trying to win every single cycle of AI. If your doorbell needs an AI and you know exactly what to build, please. But the opportunity at the data center scale is clear that it has to -- we have to have that level of investment. They want to make sure that they're getting the last full years and get the full value out of it. And they see the benefit of NVIDIA, where we're the one AI company that's kind of working with every other AI company. They see the benefit of that investment getting the software and the algorithms and the new models over time.
英伟达并非试图赢得每一个人工智能周期。如果您的门铃需要人工智能,而且您确切知道要构建什么,请。但数据中心规模的机会是明确的,我们必须——我们必须有那种投资水平。他们希望确保他们获得最后的完整年份,并从中获得全部价值。他们看到了英伟达的好处,我们是唯一与其他每家人工智能公司合作的人工智能公司。他们看到了这种投资的好处,随着时间的推移获得软件、算法和新模型。
已经获得的优势很难被取代,微软、苹果、Google、Facebook、腾讯,都是如此,后面的平台企业需要另外找出路。

Vivek Arya

Practically, the large customers, do they have separate clusters for training, separate for inference? Or are they mixing and matching? Are they reusing them some type for training, some type of -- practically, how do they do it?
实际上,大客户,他们是否有专门用于训练的集群,专门用于推断的集群?还是混合使用?他们是否在训练中重复使用某些类型,某些类型的 -- 实际上,他们是如何做到的?

Ian Buck 伊恩·巴克

It depends. Certainly, there are some geographic benefits and differences between training and inference. Most folks don't -- can do training anywhere in the globe. So we see big training clusters being put up. Usually, it's a function of where they can get the data center space and can tap into the grid, having good access to power, and the economics there is very important.
这取决于。当然,训练和推理之间存在一些地理优势和差异。大多数人不会在全球任何地方进行训练。因此,我们看到大型训练集群被建立起来。通常,这取决于他们能否获得数据中心空间并能够接入电网,拥有良好的电力访问,并且经济因素非常重要。

But training doesn't need to be localized. If you've ever used a remote desktop that's halfway around the world, you can feel the lag and latency. Training sign for that. But inference you kind of do need to be near the user. Some inference workloads might be fine.
但培训不需要本地化。如果您曾经使用过遥控桌面,那么您可以感受到延迟和延迟。培训标志。但是推断您确实需要靠近用户。一些推断工作负载可能没问题。

Batch processing inference, fine. Doing a longer chat bot might be okay. But if you're doing gen AI search, you're asking your browser a question of information you want to get an answer back, you want that answer quickly. If it's too slow, then it just -- immediately your quality of service plummets. So we often see that training clusters are put where logically, they can get the power and capability.
批处理推理,很好。做一个更长的聊天机器人可能没问题。但如果你在进行通用人工智能搜索,你在向浏览器提问你想要得到答案的信息,你希望得到快速的答案。如果太慢,那么就会立即导致服务质量下降。因此,我们经常看到训练集群被放置在逻辑上可以获得强大和能力的地方。

Inference tends to be either in those same clusters, then they'll divide it up. Just like the clouds are providing regions, folks will put a GPU in every one of those regions, and they can then serve it with both training and inference. I would just say the training part is a little bit more specialized because super big clusters can be wherever that makes the most sense for them to build and invest in the building and capability. But they are largely using more and more the same training and for training and inference, the same infrastructure. That again goes to the value of that investment.
推断往往要么在那些相同的集群中,然后它们会将其分开。就像云提供区域一样,人们会在每个区域放置一个 GPU,然后可以用它来进行训练和推断。我只想说训练部分稍微更专业一些,因为超大的集群可以建在任何对它们来说最有意义的地方,并投资于建设和能力。但他们主要是在越来越多地使用相同的训练和推断,相同的基础设施。这再次归结为对该投资的价值。

They know they can be using it for training and flip it over to inference. If you saw when we launched our Blackwell GB200 NVL72, we talked a lot about inference because those models are getting big. They need the -- they got to run that mixture of experts work through, and they also have the same infrastructure they can be used for training as well. And that's very important when we launch our new platforms. At the same time, we also make sure that they can take that -- the same building blocks and vary the sizes and capabilities.
他们知道他们可以用它进行训练,然后转换为推理。如果你看到我们推出的Blackwell GB200 NVL72 时,我们谈了很多关于推理,因为这些模型变得越来越大。他们需要 -- 他们必须运行那种专家混合工作,并且他们也有相同的基础设施可以用于训练。当我们推出新平台时,这一点非常重要。同时,我们还确保他们可以采用相同的构建模块,并改变大小和功能。

GB200, the NVL72 is designed for trillion parameter inference. For the more modest-sized 70B or 7B, we have an NVL 2, which is just two Grace Blackwells tied together, which fit nicely in standard server design and could be deployed anywhere including at the edge, the telco edge. Telco often will have a -- they'll have a cage. Cage has 100 kilowatts, you can't exceed that. So the metric is what kind of GPUs or what kind of servers can I put there that makes the most sense to serve as many models like at the edge? You'll do something different than you'll do for a big OpenAI data center or some such. And that's why we have both kind of products going.
GB200,NVL72 专为万亿参数推理而设计。对于规模较小的 70B 或 7B,我们有一个 NVL 2,它只是两个 Grace Blackwells 绑在一起,可以很好地适配标准服务器设计,并可以部署在包括边缘、电信边缘在内的任何地方。电信通常会有一个——他们会有一个笼子。笼子有 100 千瓦,不能超过这个值。因此,度量标准是我可以放置哪种 GPU 或服务器,以最合理地为边缘提供尽可能多的模型服务?你会为大型 OpenAI 数据中心或类似的东西做一些不同的事情。这就是为什么我们有这两种产品的原因。

Vivek Arya

Got it. Since you have been so intimately involved with CUDA since its founding, right, how do you address the pushback that people have is that a lot of other software extraction is being done away from CUDA and it will make CUDA obsolete at some point, that that's not really a sustainable moat for NVIDIA. How do you address that pushback?
明白了。由于您自成立以来一直与 CUDA 密切相关,对吧,您如何应对人们提出的反对意见,即许多其他软件提取正在远离 CUDA,并且在某个时候会使 CUDA 过时,这对 NVIDIA 来说并不是一个可持续的壕沟。您如何应对这种反对意见?

Ian Buck 伊恩·巴克

Yes. I think moat is a complicated word and what does it mean? The innovation -- what makes the platform useful is how many developers it has on it? How many users it has on it? What is the installed base of people can get the access to that next AI invention can be -- make sure it's compatible with that architecture, what it can do.
是的。我认为护城河是一个复杂的词,它是什么意思?创新——使平台有用的是有多少开发者在上面?有多少用户在上面?有多少人可以访问下一个人工智能发明的安装基础——确保它与那个架构兼容,它能做什么。

These foundations, these new next-generation models that show up, they're often -- they're not academic exercises. They are designed to what the limits of the capabilities of the platform can provide at the time that they are trained. And many of the models that we are enjoying today are actually trained, like they started training like two years ago.
这些基础,这些新一代模型的出现,它们通常不是学术练习。它们被设计为在它们被训练时平台能力的极限。我们今天使用的许多模型实际上是在两年前开始训练的。

There's a lag, unfortunately, in terms of how long it takes to -- when NVIDIA announces a new GPU to when the data center gets set up to when this. They're obviously thinking we're explaining what we're building to try to shorten this process, but it is directly influencing the scale of what they can build not just the number of GPUs, but how much we make -- whatever generation, we also improve the performance on a per GPU basis by X factors.
不幸的是,在 NVIDIA 宣布新 GPU 时,数据中心设置需要花费一段时间。他们显然认为我们正在解释我们正在构建的东西,试图缩短这个过程,但这直接影响了他们可以建造的规模,不仅仅是 GPU 的数量,还有我们生产的数量——无论是哪一代,我们也通过 X 倍提高了每个 GPU 的性能。

Blackwell is like 4x to 5x better at training per GPU than Hopper was and inference as well. So they're also thinking about 30x better on inference for trillion parameter models. And so that sets the bar for them how big a model and then they look at the architecture of the NVLink and what they can build. So it is a symbiosis between what we're building, what they're inventing and then keep riding that wave. And that really helps us define -- helps them define that feature of the next-generation AI model.
Blackwell在每个 GPU 的训练方面比 Hopper 好 4 到 5 倍,推理方面也是如此。因此,他们也在考虑对万亿参数模型的推理提高 30 倍。这就为他们设定了一个模型有多大的标准,然后他们看了 NVLink 的架构以及他们可以构建的内容。因此,这是我们正在构建的东西和他们正在发明的东西之间的共生关系,然后继续沿着这个浪潮前行。这确实有助于我们定义 - 有助于他们定义下一代 AI 模型的特征。

Vivek Arya

Got it. How is the outlook around Blackwell as we look at next year? First of all, do you think that because of the different -- the power requirements that are going up significantly, does that constrain the growth of Blackwell in any way? And what's sort of the lead time in engagements between when somebody wants to deploy, right, versus when they have to start a discussion with NVIDIA, i.e., how far is your visibility of growth into next year?
明白了。我们展望明年时,Blackwell周围的前景如何?首先,您认为由于不同的——显著增加的功率需求,这是否会以某种方式限制Blackwell的增长?在某人想要部署时与他们必须与英伟达开始讨论之间的参与时间是多久,即您对明年增长的可见度有多远?

Ian Buck 伊恩·巴克

So what we -- good question, actually. So there's one question about how far forward we work with all of our not just our biggest customers, but all those AI visionaries that are building those foundation models. And then what does that ramp look like in Blackwell specifically. So we stated recently in our earnings that Blackwell has now entered into production builds. We started our production.
所以我们--实际上是一个很好的问题。所以有一个关于我们与所有不仅是我们最大的客户,而且所有那些正在构建基础模型的 AI 先驱们合作到什么程度的问题。然后在Blackwell年,这个过程会是什么样子。所以我们最近在我们的收益报告中表示Blackwell现在已经进入了生产阶段。我们已经开始了生产。

The samples are now going -- will go out this quarter, and we're ramping for production outs later this year. And then everything -- that always looks like a hockey stick, you start small and you go pretty quick to the right. And the challenge, of course, is with every new technology transition comes -- the value is so high, there's always a mix of a challenge of supply and demand. We experienced that certainly with Hopper. And there'll be similar kinds of supply/demand constraints in the on-ramp of Blackwell certainly at the end of this year and going into next year.
样本现在正在进行 - 将在本季度出货,我们正在为今年晚些时候的生产进行扩大。然后一切 - 总是看起来像一根曲棍球棒,你从小开始,然后很快向右移动。当然,挑战在于每一次新技术转变都会带来 - 价值如此之高,总是存在供需挑战的混合。我们在 Hopper 方面确实有过这样的经历。在今年年底和明年初,Blackwell的上坡道上肯定会遇到类似的供需约束。

In terms of the horizon, though, that conversation on Blackwell transition and ramp and what it is and what to build starts two years in advance. The slide that was announced in Computex of what is our Hopper platform, Blackwell for the first time you guys saw Rubin. That has been a conversation for quite some time with those big customers. So they know kind of where we're going and the time scales. It's really important for us to do that.
就地平线而言,关于Blackwell过渡和斜坡以及它是什么以及要建造什么的对话是提前两年开始的。在 Computex 上宣布的幻灯片展示了我们的霍珀平台,Blackwell这是你们第一次看到鲁宾。这已经是与那些大客户进行了相当长时间的对话。所以他们知道我们的发展方向和时间表。这对我们来说非常重要。

Data centers don't drop out of the sky. They're big construction projects. They need to understand what is a Blackwell data center look like and how is it going differ than Hopper. And it will. The opportunity we saw with Blackwell was to transition to a denser form of computing, to put 72 GPUs in a single rack, which has not been taken to scale before.
数据中心不是从天上掉下来的。它们是大型建设项目。他们需要了解一个Blackwell数据中心是什么样子,以及它与霍珀有何不同。而且它会。我们看到的Blackwell机会是过渡到一种更密集的计算形式,将 72 个 GPU 放入一个机架中,这在以前尚未大规模实现过。

We have experience with it. I also do that HPC and supercomputing side. So we've seen those kinds of scale but those are one-off systems. Now we're taking it and democratizing and commoditizing that supercomputing technology to take it everywhere. Very challenging.
我们有经验。我也在做 HPC 和超级计算方面。所以我们见过那些规模,但那些都是一次性系统。现在我们正在将超级计算技术民主化和商品化,让它无处不在。非常具有挑战性。

And of course, we've been talking to them about it for two years now and not just the hyperscalers but also the supply chain. In Taiwan, for example, the people that are building the liquid cooling infrastructure, the power shelves, the WIPs, which is the cables that go down into the bus bars. The opportunity here is to help them get the maximum performance through a fixed megawatt data center and at the best possible cost and optimized for cost. By doing 72 GPUs in a single rack, we need to move to liquid cooling. We want to make sure we had the higher density, higher power rack, but the benefit is that we can do all 72 in one NVLink domain.
当然,我们已经与他们讨论了两年,不仅仅是超大规模的数据中心,还有供应链。例如,在台湾,正在建设液冷基础设施、电源架和 WIPs 的人们,这些是通往母线的电缆。这里的机会是帮助他们通过一个固定的兆瓦数据中心获得最大性能,并以最佳成本和成本优化。通过在单个机架中放置 72 个 GPU,我们需要转向液冷。我们希望确保我们有更高密度、更高功率的机架,但好处是我们可以在一个 NVLink 域中完成所有 72 个。

Connect them all up with copper instead of having to go to optics, which adds cost and adds power. And every time you add cost and power, you're just taking away from a number of GPUs you can put in your 10-, 50-, 100-megawatt data center. So that is driving us towards reducing cost, increasing density.
用铜线连接它们,而不是使用光学,这会增加成本和功耗。每次增加成本和功耗,都会减少您可以放入 10、50、100 兆瓦数据中心的 GPU 数量。因此,这推动我们降低成本,增加密度。

So when you look at a Blackwell, you may say, well, it's really hot, that's actually going to be significantly reducing -- improving the toll throughput of a fixed power data center. So there's a strong economic and technology driver to transition to more denser and more power efficient and more -- and next-generation cooling technologies than just air.
所以当你看到一个Blackwell时,你可能会说,嗯,这真的很热,实际上这将显著降低 - 提高固定功率数据中心的吞吐量。因此,有一个强大的经济和技术驱动力,使得过渡到更密集、更节能、更 - 下一代冷却技术比仅仅是空气更具吸引力。

Water is a fantastic mover of heat. Your house is built with insulation that is nothing more than just trapping air. Air is actually an insulator. It's not a good transfer to heat, but water is excellent at it. If you ever jumped from a 70-degree pool from a 70-degree air, it feels really cold.
水是一个很棒的热传导介质。你的房子建造时使用的绝缘材料其实就是困住空气。空气实际上是一种绝缘体。它不擅长传导热量,但水却非常擅长。如果你曾经从一个水温为 70 度的游泳池跳进 70 度的空气中,会感觉非常冷。

That's because water is sucking the heat right out of you. It's really good at moving heat around. And that efficiency goes right to more GPUs, more capabilities and denser, more capable AI systems.
这是因为水正在把热量从你身上抽走。它非常擅长传递热量。这种效率直接转移到更多的 GPU、更多的功能和更密集、更有能力的人工智能系统。

Vivek Arya

Got it. So customers who are deploying Blackwell, are they replacing the Hoppers or Amperes that were already in place? Or are they putting up new infrastructure? Like how should we think about kind of the replacement cycle of these?
明白了。那么部署Blackwell的客户,是在替换已经存在的 Hoppers 或 Amperes 吗?还是正在建立新的基础设施?我们应该如何考虑这些替换周期?

Ian Buck 伊恩·巴克

Yes, they can't build the data centers fast enough, so what they're doing is they're decommissioning or accelerating their CPU. If you have a bunch of -- they obviously still have quite a -- we're not in every data center. Obviously, the vast majority of systems in hyperscaler CPU systems. So if you want to make space and you can only build so fast.
是的,他们无法建造数据中心得足够快,所以他们正在停用或加速他们的 CPU。如果你有一堆——显然他们仍然有相当多——我们并不在每个数据中心。显然,绝大多数超大规模 CPU 系统。所以如果你想腾出空间,你只能建造得这么快。

Vivek Arya

Taking out traditional servers.
取出传统服务器。

Ian Buck 伊恩·巴克

They can retire their old legacy systems that maybe they've just left, not upgraded. They can accelerate the decommission of the older CPU infrastructure. They can also accelerate it. So actually, we've had a lot more conversations with all the hyperscalers on this old workflow that was on CPUs kind of didn't have any people on that worked is in sort of sustaining.
他们可以退休他们可能只是留下来没有升级的旧遗留系统。他们可以加速淘汰较旧的 CPU 基础设施。他们也可以加速。所以实际上,我们与所有超大规模运算的人进行了更多关于这个旧工作流程的对话,这个工作流程是在 CPU 上的,没有任何人在那里工作,是在维持状态。

They're going back in and like, okay, actually, we should probably go accelerate this old database workload, this machine learning workload that we've left alone for so many years because we can take 1,000 servers and do what 1,000 servers were doing with just 10 GPU servers.
他们正在重新进入,就像,好吧,实际上,我们可能应该加速这个旧数据库工作负载,这个机器学习工作负载,我们已经放置了这么多年,因为我们可以用只有 10 个 GPU 服务器做 1,000 台服务器正在做的事情。

And I just freed up hundreds of racks and megawatts of power. So there is a -- it's not just the new data centers that are being built. What they're doing is actually making space for more and more GPUs to come in. Of course, they're not retiring the Hoppers. They can't stop Hopper and they can sell every Ampere and they can sell some of the earlier generation Volta systems in some cases or keep them around. What we're seeing is the combination of both building new and retiring or deprecating or accelerating their CPU infrastructure.
我刚刚释放了数百个机架和兆瓦的电力。所以这不仅仅是正在建造的新数据中心。他们实际上正在为更多的 GPU 腾出空间。当然,他们不会淘汰 Hopper。他们无法停止 Hopper,他们可以销售每个 Ampere,他们可以在某些情况下销售一些早期的 Volta 系统或保留它们。我们看到的是新建设和淘汰、废弃或加速他们的 CPU 基础设施的结合。

Vivek Arya

Got it. And lastly, InfiniBand versus the Ethernet, right? So most of the clusters that NVIDIA has built so far have primarily used InfiniBand. What is the strategy behind the new Spectrum-X product because there is a large incumbent that is out there? Just like NVIDIA is a large incumbent on the compute side, there is an incumbent on the switching side. So what did make customers adopt your product versus staying with the incumbent?
明白了。最后,InfiniBand 与 Ethernet 相比,对吧?因此,迄今为止,NVIDIA 构建的大多数集群主要使用 InfiniBand。新的 Spectrum-X 产品背后的策略是什么,因为市场上有一个庞大的现有供应商?就像 NVIDIA 在计算方面是一个庞大的现有供应商一样,在交换方面也有一个现有供应商。那么,是什么让客户选择采用您的产品而不是继续使用现有供应商的产品?

Ian Buck 伊恩·巴克

Yes. So first, the -- we support all different kinds of networking. Certainly, Amazon has their EFA networking, which we support and execute toward. Each of the hyperscalers has different flavors of their own Ethernet or networking or some have taken the decision to get the best possible, which is the InfiniBand platform, and you see that with Microsoft, and they're matching our performance 1:1 and benchmarks like MLPerf, connecting 10,000 GPUs with InfiniBand. And we have a 10,000 GPU cluster.
是的。首先,我们支持各种不同类型的网络。当然,亚马逊有他们的 EFA 网络,我们支持并朝着执行。每个超大规模云服务提供商都有自己不同风味的以太网或网络,有些已经决定采用最好的选择,即 InfiniBand 平台,你可以看到微软也是这样,他们在像 MLPerf 这样的基准测试中与我们的性能 1:1 匹配,使用 InfiniBand 连接了 1 万个 GPU。而我们有一个 1 万个 GPU 的集群。

They have a 10,000 GPU cluster they get the same score, they're getting the best on MLPerf. Ethernet is tricky in the sense that the standard Ethernet infrastructure is really important. It is a data center scale networking technology. It has a huge ecosystem of software capabilities for managing at scale. Ethernet's important.
他们拥有一个拥有 10,000 个 GPU 集群的系统,他们获得了相同的分数,在 MLPerf 上表现最佳。 以太网在标准以太网基础设施方面很棘手。 它是一个数据中心规模的网络技术。 它拥有庞大的软件生态系统,可用于大规模管理。 以太网很重要。

Ethernet was originally designed for sort of that north-south use case. You have a server, wants to talk to the rest of the world, a CPU core wants to talk to the rest of the world. That's what Ethernet did. You can talk across the whole data center but it was for the traditional use cases. When you get to AI, it's a different kind of problem.
以太网最初设计用于这种南北用例。您有一个服务器,想要与世界其他地方通信,一个 CPU 核心想要与世界其他地方通信。这就是以太网的作用。您可以在整个数据中心进行通信,但它是为传统用例设计的。当涉及到 AI 时,这是一种不同类型的问题。

It's kind of a supercomputing problem. You have these billions of dollars' worth of GPU infrastructure all trying to train a model like Llama 3. And now we're going to 100,000 all trying to train even a bigger model. And so that east-west traffic is incredibly important. If one of these packets kind of slows down, one of these links get lost or has a blip, the entire infrastructure slows down because it's waiting for the slowest guy.
这有点像一个超级计算问题。你有价值数十亿美元的 GPU 基础设施,都在尝试训练像 Llama 3 这样的模型。现在我们要尝试训练一个更大的模型,有 100,000 个 GPU 在尝试。因此,东西向的流量非常重要。如果其中一个数据包变慢,其中一个链接丢失或出现故障,整个基础设施就会变慢,因为它在等待最慢的那个。

And InfiniBand was designed to optimize that and make sure the performance was the max possible so everyone could talk to everybody else. And that's the difference between designing for east-west versus north-south. You don't care if you're -- if you have a slightly slower connection than the person next to you, everybody is happy. But if your connection slowed everybody down, that would be a problem. And if you look at it from a data center standpoint, that's billions of dollars of wasted GPU, like billions, just idle.
InfiniBand 旨在优化这一点,并确保性能达到最大可能,以便每个人都可以与其他人交流。这就是为东西设计与为南北设计之间的区别。如果你的连接速度比旁边的人慢一点,你并不在乎,每个人都很高兴。但如果你的连接拖慢了所有人的速度,那就是个问题。从数据中心的角度来看,这就是数十亿美元的 GPU 浪费,就像数十亿美元的闲置一样。

The whole thing goes down. So that's what Spectrum-X is addressing, to provide a standard Ethernet -- support of the standard Ethernet ecosystem, which many hyperscalers and clouds and everybody is standardized on. But add the technologies that support the east-west traffic, the adaptive routing, the congestion control technique, all the stuff that you need to do to make sure that you have that deterministic performance east-west so that that AI can progress, your GPUs stay utilized, and it's a really hard problem. We've been accelerating our Spectrum-X roadmap as a result. We still have InfiniBand, which is obviously very important in supercomputing and for the ultimate performance.
整个事情都在发生。这就是 Spectrum-X 要解决的问题,提供标准以太网 - 支持标准以太网生态系统,许多超大规模云和所有人都在标准化的生态系统。但是添加支持东西流量、自适应路由、拥塞控制技术等技术,所有这些都是为了确保您拥有确定性性能的东西流,以便 AI 可以进步,您的 GPU 保持利用,并且这是一个非常棘手的问题。因此,我们加快了 Spectrum-X 路线图的进展。我们仍然拥有 InfiniBand,这在超级计算和最终性能方面显然非常重要。

But to provide that kind of Ethernet that can go and train giant models, it requires that technology to be embedded, integrated, and provided in an Ethernet ecosystem. So that's what Spectrum-X is.
但是要提供那种可以运行和训练巨型模型的以太网,需要将技术嵌入、集成并提供在以太网生态系统中。这就是 Spectrum-X。

Vivek Arya

Do you see the attach rate of your Ethernet switch going up? Because I think NVIDIA has outlined like several billion dollars of which includes the NICs as well, right? Even before Blackwell starts, right?
你看到你的以太网交换机的附加率在上升吗?因为我认为英伟达已经概述了数十亿美元,其中包括网卡,对吧?甚至在Blackwell开始之前,对吧?

Ian Buck 伊恩·巴克

There's a 100,000 GPU training project that's being put together right now, which is it will be Spectrum-X.
目前正在进行一个价值 100,000 美元的 GPU 训练项目,这将是 Spectrum-X。

Vivek Arya

And then as Blackwell rolls out next year, do you see your attached rate of Ethernet?
然后,随着明年Blackwell的推出,您是否看到以太网的附加率?

Ian Buck 伊恩·巴克

Yes, you'll see a mix of both Ethernet and InfiniBand.
是的,您将看到以太网和 InfiniBand 的混合。

Vivek Arya

Got it, okay. Terrific. With that, thank you so much, Ian. Really appreciate your insights. Thanks, everyone, for joining.
明白了,好的。太棒了。谢谢你的见解,Ian。真的很感激。谢谢大家的参与。

    Article Comments Update


      热门标签


        • Related Articles

        • 2024-12-03 NVIDIA Corporation (NVDA) UBS Global Technology Conference (Transcript)

          Call Start: 9:35 January 1, 0000 10:05 AM ET NVIDIA Corporation (NASDAQ:NVDA) UBS Global Technology Conference Call December 3, 2024 9:35 AM ET Company Participants Colette Kress - Executive Vice President and Chief Financial Officer Conference Call ...
        • 2024-11-20 NVIDIA Corporation (NVDA) Q3 2025 Earnings Call Transcript

          NVIDIA Corporation (NASDAQ:NVDA) Q3 2025 Earnings Conference Call November 20, 2024 5:00 PM ET 英伟达公司(纳斯达克:NVDA)2025 年第三季度收益电话会议 2024 年 11 月 20 日 下午 5:00 ET Company Participants 公司参与者 Stewart Stecker - Investor Relations Stewart Stecker - 投资者关系 ...
        • 2024-02-21 NVIDIA Corporation (NVDA) Q4 2024 Earnings Call Transcript

          NVIDIA Corporation (NASDAQ:NVDA) Q4 2024 Earnings Conference Call February 21, 2024 5:00 PM ET 英伟达公司(纳斯达克股票代码:NVDA)2024 年第四季度收益电话会议 2024 年 2 月 21 日 下午 5:00 美东时间 Company Participants 公司参与者 Simona Jankowski - VP, IR 西蒙娜·扬科夫斯基 - 副总裁,投资者关系 Colette Kress ...
        • 2024-08-28 NVIDIA Corporation (NVDA) Q2 2025 Earnings Call Transcript

          NVIDIA Corporation (NASDAQ:NVDA) Q2 2025 Earnings Conference Call August 28, 2024 5:00 PM ET 英伟达公司(纳斯达克:NVDA)2025 年第二季度财报电话会议 2024 年 8 月 28 日 下午 5:00(东部时间) Company Participants 公司参与者 Stewart Stecker - Investor Relations 斯图尔特·斯特克 - 投资者关系 Colette Kress ...
        • 2024-05-22 NVIDIA Corporation (NVDA) Q1 2025 Earnings Call Transcript

          NVIDIA Corporation (NASDAQ:NVDA) Q1 2025 Earnings Conference Call May 22, 2024 5:00 PM ET 英伟达公司(纳斯达克股票代码:NVDA)2025 年第一季度收益电话会议 2024 年 5 月 22 日 下午 5:00 ET Company Participants 公司参与者 Simona Jankowski - Vice President, Investor Relations Simona ...