Jensen Huang:
Ilya, unbelievable. Today is the day after GPT-4. It's great to have you here. I'm delighted to have you. I've known you a long time.
黄仁勋:
伊利亚,难以置信。今天是GPT-4发布的第二天。很高兴你能来。我很高兴见到你。我们认识很久了。
The journey and just my mental hit, my mental memory of the time that I've known you and the seminal work that you have done.
从我们相识到现在,我脑海中浮现的是你所取得的开创性成就。
Starting in University of Toronto, the co-invention of AlexNet with Alex and Jeff Hinton that led to the Big Bang of modern artificial intelligence, your career that took you out here to the Bay Area, the founding of OpenAI,
从多伦多大学开始,你与Alex和Jeff Hinton共同发明了AlexNet,掀起了现代人工智能的大爆炸,然后你的职业生涯将你带到湾区,创立了OpenAI,
GPT 1, 2, 3, and then, of course, ChatGPT, the AI heard around the world. This is this is the incredible resume of a young computer scientist, you know, an entire community and industry at all with your achievements.
推出了GPT-1、GPT-2、GPT-3,当然还有震惊世界的ChatGPT。这是年轻计算机科学家的非凡履历,你的成就影响了整个社区和行业。
I guess my I just want to go back to the beginning and ask you deep learning. What was your intuition around deep learning? Why did you know that it was going to work? Did you have any intuition that are going to lead to this kind of success?
我想回到最初,问问你对深度学习的直觉是什么。你为什么知道它会成功?你是否有预感它会带来如此大的成功?
Ilya Sutskever:
Okay, well, first of all, thank you so much for the quote for all the kind words. A lot has changed. Thanks to the incredible power of deep learning.
伊利亚·苏茨克维尔:
好的,首先,非常感谢你的夸奖和好话。得益于深度学习的巨大力量,许多事情都发生了变化。
I think my personal starting point, I was interested in artificial intelligence for a whole variety of reasons. Starting from an intuitive understanding of appreciation of its impact.
我个人的起点是,我出于多种原因对人工智能产生了兴趣。从直观理解并欣赏它的影响开始。
And also I had a lot of curiosity about what is consciousness? What is the human experience? And it felt like progress in artificial intelligence will help with that.
我对意识是什么、人类体验是什么有着浓厚的好奇心。我觉得人工智能的进步将有助于解答这些问题。
The next step was, well, back then I was starting in 2002-2003, and it seemed like learning is the thing that humans can do, that people can do, that computers can't do at all. In 2003-2002, computers could not learn anything.
下一步是,当时我从2002年到2003年开始研究。那时,学习是人类能做到但计算机完全做不到的事情。在2002到2003年,计算机无法学习任何东西。
And it wasn't even clear that it was possible in theory. And so I thought that making progress in learning, in artificial learning, in machine learning, that would lead to the greatest progress in AI.
甚至理论上都不清楚这是否可行。因此,我认为在学习、人工学习、机器学习方面取得进展,将带来人工智能的最大进步。
And then I started to look around for what was out there. And nothing seemed too promising. But to my great luck, Jeff Hinton was a professor at my university. And I was able to find him.
然后我开始寻找现有的研究方向,但似乎没有什么特别有前景。幸运的是,Jeff Hinton是我们大学的一位教授,我找到了他。
And he was working in neural networks, and it immediately made sense. Because neural networks had the property that we are learning, we are automatically programming parallel computers. Back then the parallel computers were small.
他当时在研究神经网络,这一下子让我豁然开朗。因为神经网络的特性就是通过学习,我们能够自动编程并驱动并行计算机。当时的并行计算机规模还很小。
But the promise was if you could somehow figure out how learning in neural networks work, then you can program small parallel computers from data. And it was also similar enough to the brain and the brain works.
但其潜力在于,如果你能弄清楚神经网络中的学习原理,你就可以通过数据编程小型并行计算机。此外,这种方式与大脑及其工作原理有一定的相似性。
So it's like you had these several factors going for it. Now, it wasn't clear how to get it to work. But of all the things that existed, that seemed like it had by far the greatest long-term promise.
因此,存在多个有利因素。尽管当时尚不清楚如何让它发挥作用,但在所有现有技术中,这种方法似乎具有最伟大的长期潜力。
Jensen Huang:
At the time that you first started working with deep learning and neural networks, what was the scale of the network? What was the scale of computing at that moment in time? What was it like?
黄仁勋:
当你刚开始研究深度学习和神经网络时,网络的规模有多大?当时的计算规模如何?是什么样的情况?
Ilya Sutskever:
An interesting thing to note is that the importance of scale wasn't realized back then. So people would just train on neural networks with like 50 neurons, 100 neurons, several hundred neurons. That would be like a big neural network.
伊利亚·苏茨克维尔:
有趣的是,当时人们并没有意识到规模的重要性。因此,大家仅在50个、100个甚至几百个神经元的网络上进行训练。这已经算是很大的神经网络了。
A million parameters would be considered very large. We would run our models on unoptimized CPU code because we were a bunch of researchers. We didn't know about BLAST. We used MATLAB. The MATLAB was optimized.
拥有一百万个参数就被认为是非常大的网络。我们使用未经优化的CPU代码运行模型,因为我们只是一群研究人员。我们不知道BLAST优化库。我们用的是MATLAB,而MATLAB是经过优化的。
And we just experiment, like what is even the right question to ask, you know, so you try to gather, to just find interesting phenomena, interesting observation. You can do this small thing and you can do that small thing.
我们只是不断实验,比如要问什么才是正确的问题。我们试图探索,寻找有趣的现象和观察。可以做这个小实验,也可以做那个小实验。
You know, Jeff Hinton was really excited about training neural nets on small little digits, both for classification. And also he was very interested in generating them. So the beginnings of generating models were right there.
你知道,Jeff Hinton对在小数据集上训练神经网络非常感兴趣,既用于分类,也用于生成数据。因此,生成模型的雏形就在那时诞生了。
But the question is like, okay, so you've got all this cool stuff floating around. What really gets traction? And so that it wasn't so it wasn't obvious that this was the right question.
但问题在于,好吧,你有这么多有趣的东西,但究竟什么才是有实质性进展的方向?当时并不清楚这个问题是否正确。
Back then but in hindsight that turned out to be the right question.
回头来看,这确实是那个时代的正确问题。
Jensen Huang:
Now the year AlexNet was 2012. Now you and Alex were working on AlexNet for some time before then.
黄仁勋:
AlexNet发布的时间是2012年。你和Alex早在此之前就已经开始研究AlexNet了。
And at what point was it clear to you that you wanted to build a computer vision-oriented neural network, that ImageNet was the right set of data to go for, and to somehow go for the computer vision contest?
你是在什么时候明确想要构建一个面向计算机视觉的神经网络,并认为ImageNet是合适的数据集,且需要参与计算机视觉竞赛的?
Ilya Sutskever:
Yeah, so I can talk about the context there. I think probably two years before that, it became clear to me that supervised learning is what's going to get us the traction. And I can explain precisely why. It wasn't just an intuition.
伊利亚·苏茨克维尔:
好的,我可以谈谈当时的背景。我认为大约在那之前的两年,我就意识到监督学习将是取得突破的关键。我可以明确解释为什么,这不仅仅是出于直觉。
It was, I would argue, an irrefutable argument, which went like this. If your neural network is deep and large, then it could be configured to solve a hard task. So that's the keyword. Deep and large.
我认为这是一个不可反驳的论点,逻辑如下:如果你的神经网络足够深且足够大,那么它可以被配置来解决一个复杂的任务。所以关键点就是“深”和“大”。
People weren't looking at large neural networks. People were, you know, maybe studying a little bit of depth in neural networks. But most of the machine learning field wasn't even looking at neural networks at all.
当时人们并没有关注大规模的神经网络。大家可能在研究神经网络的深度,但机器学习领域的大部分研究者根本没有关注神经网络。
They were looking at all kinds of Bayesian models and kernel methods. Which are theoretically elegant methods which have the property that they actually can't represent a good solution no matter how you configure them.
他们关注的是各种贝叶斯模型和核方法。这些方法在理论上很优雅,但无论如何配置,它们实际上无法很好地表示问题的解决方案。
Whereas the large and deep neural network can represent a good solution to the problem. To find the good solution, you need a big data set, which requires it and a lot of compute to actually do the work. We've also made advanced work.
而大规模、深度的神经网络可以很好地表示问题的解决方案。为了找到好的解决方案,你需要一个大的数据集,并需要大量计算资源来完成工作。我们也在这一领域取得了一些进展。
So we've worked on optimization for a little bit. It was clear that optimization is a bottleneck. And there was a breakthrough by another grad student in Geoff Hinton's lab called James Martens.
因此,我们在优化方面也进行了一些研究。很明显,优化是一个瓶颈。当时,Geoff Hinton实验室的一名研究生James Martens取得了突破。
And he came up with an optimization method which is different from the one we are using now. Some second-order method. But the point about it is that it's proved that we can train those neural networks.
他提出了一种优化方法,与我们现在使用的方法不同,是某种二阶方法。但关键在于,这证明了我们可以训练这些神经网络。
Because before we didn't even know we could train them. So if you can train them, you make it big, you find the data and you will succeed. So then the next question is, well, what data?
因为在此之前,我们甚至不知道是否能够训练这些网络。所以,如果你能训练它们,只要让它们规模变大,找到合适的数据集,你就能成功。那么下一个问题是,应该使用什么数据?
And an ImageNet data set back then, it seemed like this unbelievably difficult data set. But it was clear that if you were to train a large convolutional neural network on this data set, it must succeed. If it just can have the compute.
当时的ImageNet数据集看起来极其困难。但很明显,如果你能在这个数据集上训练一个大型卷积神经网络,它一定会成功——只要有足够的计算资源。
Jensen Huang:
And right at that time, you and I, our history and our paths intersected. And somehow, you had the observations at a GPU and at that time we had, this is our couple of generations into a CUDA GPU, and I think it was GTX 580 generation.
黄仁勋:
就在那时,你我之间的历史交汇了。不知怎的,你对GPU产生了兴趣,当时我们已经开发了几代CUDA GPU,我记得是GTX 580这一代。
You had the insight that the GPU could actually be useful for training your neural network models. What was that? How did that day start? Tell me, you know, you and I, you never told me that moment. You know, how did that day start?
你当时意识到GPU可能在训练神经网络模型中会有用。那是怎么回事?那一天是怎么开始的?告诉我,你知道的,你从未对我提起过那个时刻。那一天到底是怎么开始的?
Ilya Sutskever:
Yeah, so you know, the GPUs appeared in our lab, in our Toronto lab, thanks to Jeff. And he said, we should try these GPUs. And we started trying and experimenting with them.
伊利亚·苏茨克维尔:
是的,GPU出现在我们多伦多的实验室中,这要归功于Jeff。他说我们应该试试这些GPU。于是我们开始尝试并做实验。
And it was a lot of fun, but it was unclear what to use them for exactly. Where are you going to get the real traction?
这非常有趣,但当时并不清楚具体可以用它们做什么。到底能在哪方面取得真正的突破?
But then, with the existence of the ImageNet dataset, and then it was also very clear that the convolutional neural network is such a great fit for the GPU.
但是,有了ImageNet数据集后,很明显卷积神经网络非常适合GPU。
So it should be possible to make it go unbelievably fast and therefore train something which would be completely unprecedented in terms of its size. And that's how it happened.
因此,我们认为应该可以让它以惊人的速度运行,从而训练出规模前所未有的模型。事情就是这样发生的。
And, you know, very fortunately, Alex Krizhevsky, he really loved programming the GPU. And he was able to do it. He was able to code, to program really fast convolutional kernels.
幸运的是,Alex Krizhevsky非常喜欢编写GPU代码。他做到了,他能够编写运行速度非常快的卷积核程序。
And then and trained the neural net on the ImageNet dataset and that led to the result. But it was like...
然后在ImageNet数据集上训练了神经网络,这就得出了结果。但是那时候……
Jensen Huang:
It shocked the world. It shocked the world. It broke the record of computer vision by such a wide margin that it was a clear discontinuity.
黄仁勋:
它震惊了世界,震惊了世界。它以如此大的优势打破了计算机视觉的记录,这显然是一个不连续的飞跃。
Ilya Sutskever:
Yeah.
伊利亚·苏茨克维尔:
是的。
Jensen Huang:
Yeah.
黄仁勋:
是的。
Ilya Sutskever:
And I would say it's not just like there is another bit of context there. It's not so much like when you say break the record, there is an important, it's like, I think there's a different way to phrase it.
伊利亚·苏茨克维尔:
我想说,这不仅仅是打破记录的问题。这其中还有一个背景很重要。我觉得可以换个说法。
It's that that data set was so obviously hard and so obviously outside of reach of anything. People are making progress with some classical techniques and they were actually doing something.
当时那个数据集显然非常难,显然超出了任何现有方法的能力范围。虽然人们通过一些传统技术取得了一些进展,也确实在做些事情。
But this thing was so much better on the data set, which was so obviously hard. It's not just that it's just some competition. It was a competition which back in the day...
但这项成果在这个显然很难的数据集上的表现却要好得多。这不仅仅是某个比赛,它是当时……
Jensen Huang:
It wasn't an average benchmark.
黄仁勋:
这不是一个普通的基准测试。
Ilya Sutskever:
It was so obviously difficult, so obviously out of reach, and so obviously with the property that if you did a good job, that would be amazing.
伊利亚·苏茨克维尔:
它显然非常困难,显然超出了能力范围,并且显然具有这样的特点:如果你能做好,那将是惊人的。
Jensen Huang:
Big bang of AI. Fast forward to now. You came out to the valley. You started OpenAI with some friends. You're the chief scientist. Now, what was the first initial idea about what to work on at OpenAI?
黄仁勋:
人工智能的大爆炸。时光快进到现在。你来到硅谷,与一些朋友创立了OpenAI。你是首席科学家。那么,OpenAI最初的研究方向是什么?
Because you guys worked on several things. Some of the trails of inventions and work you could see led up to the ChatGPT moment. But what were the initial inspiration? How would you approach intelligence from that moment and led to this?
因为你们研究了很多领域。从你们的发明和工作中可以看出,有些直接通向了ChatGPT时刻。但最初的灵感是什么?从那一刻起,你们如何接近智能,并最终达成这一成就?
Ilya Sutskever:
Yeah. Obviously, when we started, it wasn't 100% clear how to proceed. And the field was also very different compared to the way it is right now.
伊利亚·苏茨克维尔:
是的。显然,在我们刚开始时,如何推进还不是百分之百清楚。这个领域也与现在完全不同。
So right now we already used, we already used to, you have these amazing artifacts, these amazing neural nets who are doing incredible things and everyone is so excited.
现在,我们已经习惯了拥有这些令人惊叹的神经网络,它们能做出令人难以置信的事情,所有人都感到兴奋。
But back in 2015, 2016, early 2016, when we were starting out, the whole thing seemed pretty crazy. There were so many fewer researchers.
但在2015年到2016年初,我们刚开始时,这一切看起来非常疯狂。研究人员少得多。
Maybe there were between a hundred and a thousand times fewer people in the field compared to now. Like back then you had like 100 people. Most of them were working in Google slash DeepMind and that was that.
也许那个领域的人数比现在少了100到1000倍。当时,大概有100人,其中大多数在Google或DeepMind工作,就这样。
And then there were people picking up the skills, but it was very, very scarce, very rare still.
随后有些人开始学习这些技能,但这仍然非常稀缺,非常罕见。
And we had two big initial ideas at the start of OpenAI that had a lot of staying power and they stayed with us to this day. And I'll describe them right now.
我们在OpenAI成立初期有两个重要的初始想法,它们具有很强的持久力,并一直伴随我们至今。我现在来描述它们。
The first big idea that we had, one which I was especially excited about very early on, is the idea of unsupervised learning through compression. Some context.
第一个重要的想法,也是我很早就特别兴奋的,是通过压缩实现无监督学习。这需要一些背景说明。
Today, we take it for granted that unsupervised learning is this easy thing and you just pre-train on everything and it all does exactly as you'd expect.
如今,我们理所当然地认为无监督学习很简单,你只需在一切数据上进行预训练,结果都会如你所料。
In 2016, unsupervised learning was an unsolved problem in machine learning that no one had any insight, any clue as to what to do. Jan LeCun would go around and give talks saying that you have this grand challenge in supervised learning.
但在2016年,无监督学习是机器学习中尚未解决的问题,没有人对如何做有任何洞见或线索。Jan LeCun当时到处演讲,说监督学习是一个巨大的挑战。
And I really believed that really good compression of the data will lead to unsupervised learning.
而我真的相信,高效的数据压缩将引领无监督学习。
Now, compression is not language that's commonly used to describe what is really being done until recently, when suddenly it became apparent to many people that those GPTs actually compress the training data.
过去,"压缩"并不是用来描述这一过程的常用语言,直到最近,很多人突然意识到GPT模型实际上是在压缩训练数据。
You may recall the Ted Chiang New York Times article, which also alluded to this. But there is a real mathematical sense in which training these autoregressive generative models compress the data.
你可能还记得Ted Chiang在《纽约时报》上的文章,也提到了这一点。从数学上讲,训练这些自回归生成模型确实是在压缩数据。
And intuitively, you can see why that should work. If you compress the data really well, you must extract all the hidden secrets which exist in it. Therefore, that is the key.
直观上,你也能明白为什么这会有效。如果你能很好地压缩数据,就必然提取出其中隐藏的所有秘密。因此,这就是关键。
So that was the first idea that we're really excited about.
这就是我们当时真正感到兴奋的第一个想法。
And that led to quite a few works in OpenAI, including the sentiment neuron, which I'll mention.
这促成了OpenAI的一些研究成果,包括情感神经元,我会提到它。
Very briefly, this work might not be well known outside of the machine learning field, but it was very influential, especially in our thinking.
简单来说,这项工作在机器学习领域之外可能不太为人所知,但它非常有影响力,尤其是在我们的思考中。
The result there was that when you train a neural network—back then it was not a transformer, it was before the transformer—a small recurrent neural network, LSTM, to those who remember.
当时的结果是,当你训练一个神经网络——那时还没有Transformer,是在Transformer之前的小型循环神经网络,熟悉的人会记得LSTM。
Jensen Huang:
Sequence work, you've done, I mean, this is some of the work that you've done yourself.
黄仁勋:
你在序列建模方面的工作,包括你亲自参与的一些研究。
Ilya Sutskever:
So the same LSTM with a few twists, trained to predict the next token in Amazon reviews, next character.
伊利亚·苏茨克维尔:
我们在LSTM的基础上做了一些改进,用它来预测亚马逊评论中的下一个标记或下一个字符。
And we discovered that if you predict the next character well enough, there will be a neuron inside that LSTM that corresponds to its sentiment.
我们发现,如果你能够很好地预测下一个字符,那么LSTM中就会有一个神经元与评论的情感对应起来。
So that was really cool, because it showed some traction for unsupervised learning.
这非常有趣,因为它表明无监督学习可以取得一些进展。
And it validated the idea that really good next character prediction, next something prediction, compression has the property that it discovers the secrets in the data.
这也验证了一个想法,即通过高质量的下一个字符预测或其他预测,压缩数据能够挖掘出数据中的隐藏秘密。
That's what we see with these GPT models, right?
这正是我们在GPT模型中看到的,对吧?
You train and people say it's just statistical correlation. I mean, at this point, it should be so clear to anyone.
你训练模型时,人们可能会说这只是统计相关性。但现在,这一点对任何人都应该非常清楚了。
Jensen Huang:
That observation also, you know, for me, intuitively opened up the whole world of where do I get the data for unsupervised learning? Because I do have a whole lot of data.
黄仁勋:
这一观察直观地为我打开了无监督学习的数据来源的世界。因为我确实有大量数据。
If I could just make you predict the next character, and I know what the ground truth is, I know what the answer is, I could train a neural network model with that.
如果我能让模型预测下一个字符,而我知道真实答案是什么,那么我就可以用这些数据来训练神经网络模型。
So that observation and masking and other technology, other approaches, you know, open my mind about where would the world get all the data that's unsupervised for unsupervised learning.
因此,这一观察以及掩码技术和其他方法,让我开阔了思路,思考世界上所有无监督学习的数据可以从哪里获取。
Ilya Sutskever:
Well, I think, so I would phrase it a little differently.
伊利亚·苏茨克维尔:
嗯,我的表达方式可能会有些不同。
I would say that with unsupervised learning, the hard part has been less around where you get the data from, though that part is there as well, especially now.
我认为,对于无监督学习,困难之处不太在于数据的来源,尽管这也是一个问题,尤其是现在。
But it was more about why should you do it in the first place? Why should you bother?
更重要的是,为什么你要一开始就去做这件事?为什么要费这个心思?
The hard part was to realize that training these neural nets to predict the next token is a worthwhile goal at all. That was the goal.
困难在于认识到,训练这些神经网络去预测下一个标记是一个值得追求的目标。这才是关键。
Jensen Huang:
That it would learn a representation. That it would be able to understand.
黄仁勋:
它能够学习一种表示形式,能够理解信息。
Ilya Sutskever:
That's right. That it will be useful.
伊利亚·苏茨克维尔:
没错,这将是有用的。
Jensen Huang:
Use grammar and yeah.
黄仁勋:
掌握语法,是的。
Ilya Sutskever:
But to actually, it just wasn't obvious.
伊利亚·苏茨克维尔:
但实际上,这一点并不显而易见。
Jensen Huang:
Right.
黄仁勋:
是的。
Ilya Sutskever:
So people weren't doing it. But the sentiment neuron work. And you know, I want to call out Alec Radford is a person who really was responsible for many of the advances there.
伊利亚·苏茨克维尔:
所以当时没人这样做。但情感神经元的研究……我想特别提到Alec Radford,他对其中的许多进展负有重要责任。
The sentiment, this was before GPT-1, was the precursor to GPT-1, and it influenced our thinking a lot.
情感神经元的研究是在GPT-1之前,是GPT-1的前身,对我们的思考产生了很大的影响。
Then the transformer came out, and we immediately went, Oh my God, this is the thing. And we trained GPT-1. Now along the way.
然后Transformer出现了,我们立刻意识到,这就是关键。于是我们训练了GPT-1。在这个过程中……
Jensen Huang:
You've always believed that scaling will improve the performance of these models.
黄仁勋:
你一直相信扩展规模会提升这些模型的性能。
Ilya Sutskever:
Yes.
伊利亚·苏茨克维尔:
是的。
Jensen Huang:
Larger networks, deeper networks, more training data would scale that. There was a very important paper that OpenAI wrote about the scaling laws and the relationship between loss and the size of the model and the size of the data set.
黄仁勋:
更大的网络、更深的网络和更多的训练数据可以实现扩展。OpenAI有一篇关于扩展法则的重要论文,探讨了损失函数与模型规模和数据集规模之间的关系。
When Transformers came out, it gave us the opportunity to train very, very large models in a very reasonable amount of time.
Transformer的出现为我们提供了在合理时间内训练非常大模型的机会。
But did the intuition about the scaling laws or the size of models and data and your journey of GPT 1, 2, 3, which came first? Did you see the evidence of GPT 1 through 3 first or was it an intuition about the scaling law first?
但关于扩展法则、模型规模和数据的直觉,以及你们在GPT-1、2、3的研究中,哪个先出现?是先从GPT-1到3的证据中发现,还是先有扩展法则的直觉?
Ilya Sutskever:
The intuition, so I would say that the way I'd phrase it is that I had a very strong belief that bigger is better. And that one of the goals that we had at OpenAI is to figure out how to use the scale correctly.
伊利亚·苏茨克维尔:
直觉,我会这样表述:我一直非常坚定地相信,规模越大越好。我们在OpenAI的一个目标就是弄清楚如何正确利用规模。
There was a lot of belief about in OpenAI about scale from the very beginning. The question is, what to use it for precisely?
从一开始,OpenAI内部对规模就有很大的信心。问题在于,具体用它来做什么?
Because I'll mention right now we're talking about the GPTs, but there's another very important line of work, which I haven't mentioned, the second big idea. But I think now is a good time to make a detour. And that's reinforcement learning.
因为我们现在谈论的是GPT,但还有另一条非常重要的研究路线,我之前没有提到。这是我们的第二个大想法。现在是一个很好的时机来稍作偏离,那就是强化学习。
That clearly seems important as well. What do we do with it?
强化学习显然也非常重要。我们该如何应用它?
So the first really big project that was done inside OpenAI, it was our effort at solving a real-time strategy game.
OpenAI内部完成的第一个真正大的项目是解决一个实时策略游戏的挑战。
And for context, a real-time strategy game is like, it's a competitive sport. We need to be smart, you need to have a quick reaction time, there's teamwork, and you're competing against another team. And it's pretty involved.
作为背景说明,实时策略游戏就像一项竞技运动。你需要聪明、快速反应、有团队合作,还要与另一支队伍竞争。这非常复杂。
And there is a whole competitive league for that game. The game is called Dota 2.
这个游戏有完整的竞技联赛,名字叫《Dota 2》。
And so we trained a reinforcement learning agent to play against itself, to produce with the goal of reaching a level so that it could compete against the best players in the world.
因此,我们训练了一个强化学习代理,让它与自己对战,目标是达到能够与世界顶级玩家竞争的水平。
And that was a major undertaking as well. It was a very different line. It was reinforcement learning.
这也是一个重大项目。这是一条完全不同的研究路线,是强化学习。
Jensen Huang:
Yeah, I remember the day that you guys announced that work. By the way, when I was asking earlier about there's a large body of work that has come out of OpenAI. Some of it seemed like detours.
黄仁勋:
是的,我还记得你们宣布那项工作的那一天。顺便说一下,我之前提到OpenAI的许多研究成果。有些看起来像是岔路。
But in fact, as you're explaining now, they might have been detours, seemingly detours, but they really led up to some of the important work that we're now talking about, ChatGPT.
但实际上,正如你现在解释的那样,这些可能看似岔路,但它们确实引导了我们今天讨论的一些重要工作,比如ChatGPT。
Ilya Sutskever:
Yeah. I mean, there has been real convergence where the GPTs produced the foundation and in the reinforcement learning from DOTA morphed into reinforcement learning from human feedback.
伊利亚·苏茨克维尔:
是的。我是说,这里确实存在一种融合,GPT奠定了基础,而DOTA的强化学习演变成了基于人类反馈的强化学习。
Jensen Huang:
That's right.
黄仁勋:
没错。
Ilya Sutskever:
And that combination gave us ChatGPT.
伊利亚·苏茨克维尔:
这种结合带来了ChatGPT。
Jensen Huang:
You know, there's a misunderstanding that ChatGPT is in itself just one giant large language model. There's a system around it that's fairly complicated.
黄仁勋:
你知道,很多人误以为ChatGPT只是一个巨大的大语言模型。实际上,它的外围系统相当复杂。
Could you explain briefly for the audience the fine-tuning of it, the reinforcement learning of it, the various surrounding systems that allow you to keep it on rails and give it knowledge and so on and so forth?
你能不能简要向观众解释一下它的微调、强化学习,以及围绕它的各种系统如何确保它保持在正确轨道上并提供知识等?
Ilya Sutskever:
Yeah, I can. So the way to think about it is that when we train a large neural network to accurately predict the next word in lots of different texts from the internet, what we are doing is that we are learning a world model.
伊利亚·苏茨克维尔:
是的,我可以解释。我们可以这样理解,当我们训练一个大型神经网络来精确预测互联网中各种文本的下一个词时,我们实际上是在学习一个世界模型。
It looks like we are learning this. It may look on the surface that we are just learning statistical correlations in text.
看起来我们是在学习这些。从表面上看,这似乎只是学习文本中的统计关联。
But it turns out that to just learn the statistical correlations in text, to compress them really well, what the neural network learns is some representation of the process that produced the text.
但事实证明,要真正学习文本中的统计关联并很好地压缩它们,神经网络实际上学习的是生成文本过程的一种表示。
This text is actually a projection of the world. There is a world out there, and it has a projection on this text.
文本实际上是世界的投影。外面有一个世界,它在文本上留下了投影。
And so what the neural network is learning is more and more aspects of the world, of people, of the human conditions, their hopes, dreams and motivations, their interactions and the situations that we are in.
因此,神经网络正在学习的是关于世界、关于人类、关于人类状况的越来越多的方面,包括他们的希望、梦想和动机,他们的互动以及我们所处的情境。
And the neural network learns a compressed, abstract, usable representation of that.
神经网络学习的是一种压缩的、抽象的、可用的表示形式。
This is what's being learned from accurately predicting the next word. And furthermore, the more accurate you are at predicting the next word, the higher the fidelity, the more resolution you get in this process.
这就是通过准确预测下一个词所学习到的内容。而且,你预测得越准确,这个过程中获得的细节和分辨率就越高。
So that's what the pre-training stage does. But what this does not do is specify the desired behavior that we wish our neural network to exhibit.
这就是预训练阶段的作用。但这一阶段无法指定我们希望神经网络展示的目标行为。
You see, a language model, what it really tries to do is to answer the following question:
你看,语言模型实际上在试图回答以下问题:
If I had some random piece of text on the internet, which starts with some prefix, some prompt, what will it complete to? If you just randomly ended up on some text from the internet.
如果我有一段随机的互联网文本,它以某种前缀或提示开始,它会补全成什么?就像你随机地浏览到一些互联网文本一样。
But this is different from, well, I want to have an assistant which will be truthful, that will be helpful, that will follow certain rules and not violate them. That requires additional training.
但这与“我想要一个真实、有帮助、遵循某些规则且不违反它们的助手”是不同的。这需要额外的训练。
This is where the fine-tuning and the reinforcement learning from human teachers and other forms of AI assistance come in.
这就是微调和基于人类教师以及其他形式的AI协助的强化学习的用武之地。
It's not just reinforcement learning from human teachers. It's also reinforcement learning from human and AI collaboration.
这不仅仅是基于人类教师的强化学习,还包括人类与AI协作的强化学习。
Our teachers are working together with an AI to teach our AI to behave.
我们的教师与AI合作,教导AI如何表现。
But here we are not teaching it new knowledge. This is not what's happening.
但在这里,我们并不是在教授它新知识。事实并非如此。
We are teaching it, we are communicating with it. We are communicating to it what it is that we want it to be.
我们在教导它,与它沟通。我们在向它传达我们希望它成为什么样子。
And this process, the second stage, is also extremely important. The better we do the second stage, the more useful, the more reliable this neural network will be.
这一过程,即第二阶段,也极其重要。我们在第二阶段做得越好,这个神经网络就越有用,越可靠。
So the second stage is extremely important too, in addition to the first stage of learning everything, learning as much as you can about the world—from the projection of the world, which is text.
因此,除了第一阶段的“尽可能多地学习世界的知识(从文本作为世界的投影中学习)”之外,第二阶段也极其重要。
本质和现象的关系,相当于价值投资的基本哲学和具体实施,两个部分都很重要。
Jensen Huang:
Now, you could fine-tune it. You could instruct it to perform certain things. Can you instruct it to not perform certain things so that you could give it guardrails about avoiding these types of behavior?
黄仁勋:
现在,你可以对它进行微调,可以指示它执行某些任务。那么,你是否可以指示它不执行某些任务,从而为其设定防护措施,以避免出现这些行为?
You know, give it some kind of a bounding box so that it doesn't wander out of that bounding box and perform things that are, you know, unsafe or otherwise.
你知道,为它设置某种“边界框”,使其不会超出边界框,从而避免执行不安全或其他不当的行为。
Ilya Sutskever:
Yeah. So this second stage of training is indeed where we communicate to the neural network anything we want, which includes the bounding box.
伊利亚·苏茨克维尔:
是的。第二阶段的训练确实是我们向神经网络传达任何我们希望其遵循内容的阶段,包括“边界框”。
And the better we do this training, the higher the fidelity with which we communicate this bounding box.
我们在这一训练阶段做得越好,就越能高保真地传达这些边界限制。
And so with constant research and innovation on improving this fidelity, we are able to improve this fidelity.
通过持续的研究和创新来提高这一保真度,我们能够不断改进。
And so it becomes more and more reliable and precise in the way in which it follows the intended instructions.
因此,模型在遵循预期指令方面变得越来越可靠和精确。
Jensen Huang:
ChatGPT came out just a few months ago—fastest growing application in the history of humanity. Lots of interpretations about why.
黄仁勋:
ChatGPT仅在几个月前发布,是人类历史上增长最快的应用程序。关于原因有很多解释。
But some of the things that are clear: it is the easiest application that anyone has ever created for anyone to use.
但有一些显而易见的事实:这是有史以来最容易上手的应用程序。
It performs tasks. It does things that are beyond people's expectations. Anyone can use it.
它能够完成任务,表现超出人们的预期。任何人都能使用它。
There are no instruction sets. There are no wrong ways to use it. You just use it.
它没有操作指南,也没有所谓的错误使用方式。你只需使用它即可。
And if your instructions or prompts are ambiguous, the conversation refines the ambiguity until your intent is understood by the AI.
如果你的指令或提示语含糊不清,对话会逐步澄清这些模糊性,直到AI理解你的意图。
The impact, of course, is clearly remarkable.
它的影响显然非同凡响。
Now, yesterday—this is the day after GPT-4, just a few months later.
现在,昨天——GPT-4发布后的第二天,仅几个月后。
The performance of GPT-4 in many areas is astounding. SAT scores, GRE scores, bar exams.
GPT-4在许多领域的表现令人震惊,比如SAT成绩、GRE成绩、律师资格考试。
The number of tests it is able to perform at very capable levels—very capable human levels—is astounding.
它在许多测试中达到非常高水平、接近人类能力的表现,令人惊叹。
What were the major differences between ChatGPT and GPT-4 that led to its improvements in these areas?
ChatGPT与GPT-4之间的主要差异是什么,促使其在这些领域取得了显著提升?
Ilya Sutskever:
So GPT-4 is a pretty substantial improvement on top of ChatGPT across very many dimensions. We trained GPT-4, I would say, more than six months ago, maybe eight months ago—I don't remember exactly.
伊利亚·苏茨克维尔:
GPT-4在许多方面相比ChatGPT有了相当大的改进。我们训练GPT-4是在六个月前,也许是八个月前,我具体记不清了。
GPT is the first big difference between ChatGPT and GPT-4. And that perhaps is the most important difference, is that the base on top of GPT-4 is built, predicts the next word with greater accuracy.
GPT模型本身是ChatGPT和GPT-4之间的第一个重大差异,也许是最重要的差异。GPT-4在基础模型的预测能力上更加准确。
This is really important because the better a neural network can predict the next word in text, the more it understands it.
这一点非常重要,因为神经网络对文本中下一个词预测得越好,它对文本的理解就越深刻。
This claim is now perhaps accepted by many at this point. But it might still not be intuitive or not completely intuitive as to why that is.
这一观点现在可能已经被许多人接受了,但为什么会这样,可能仍然不够直观或完全直观。
So I'd like to take a small detour and give an analogy that will hopefully clarify why more accurate prediction of the next word leads to more understanding, real understanding.
因此,我想稍作解释,用一个类比来澄清为什么更准确地预测下一个词会带来更深层次的理解,真正的理解。
Let's consider an example. Say you read a detective novel. It's like a complicated plot, a storyline, different characters, lots of events, mysteries like clues, it's unclear.
举个例子,假设你在读一本侦探小说。情节复杂,故事线交织,各种人物登场,事件众多,线索模糊,一切不明朗。
Then, let's say that at the last page of the book, the detective has got all the clues, gathered all the people, and says, "Okay, I'm going to reveal the identity of whoever committed the crime. And that person's name is?"
然后,假设在书的最后一页,侦探已经掌握了所有线索,召集了所有人,说:“好了,我要揭示犯罪者的身份。那个人的名字是?”
Jensen Huang:
Predict that word.
黄仁勋:
预测那个词。
Ilya Sutskever:
Predict that word. Exactly.
伊利亚·苏茨克维尔:
预测那个词。完全正确。
Jensen Huang:
My goodness.
黄仁勋:
天哪。
Ilya Sutskever:
Right?
伊利亚·苏茨克维尔:
对吧?
Jensen Huang:
Yeah, right.
黄仁勋:
是的,没错。
Ilya Sutskever:
There are many different words, but by predicting those words better and better, the understanding of the text keeps on increasing. GPT-4 predicts the next word better.
伊利亚·苏茨克维尔:
有很多不同的词,但通过越来越好地预测这些词,对文本的理解不断提高。GPT-4在预测下一个词方面表现得更好。
Jensen Huang:
Ilya, people say that deep learning won't lead to reasoning, that deep learning won't lead to reasoning.
黄仁勋:
伊利亚,人们说深度学习不会带来推理能力,深度学习不会带来推理能力。
But in order to predict that next word, figure out from all of the agents that were there and all of their, you know, strengths or weaknesses or their intentions, and the context, and to be able to predict that word—who was the murderer—that requires some amount of reasoning, a fair amount of reasoning.
但是,要预测下一个词,就必须从所有的角色中分析出他们的优点、弱点或意图,以及上下文,并能够预测出那个词——谁是凶手——这需要一定程度的推理,相当程度的推理。
And so how the hell is it that it’s able to learn reasoning?
那么,它是如何学会推理的?
And if it learned reasoning, you know, one of the things that I was going to ask you is: of all the tests that were taken between ChatGPT and GPT-4, there were some tests that GPT-3 or ChatGPT was already very good at.
如果它学会了推理,有一件事我想问你:在ChatGPT和GPT-4之间进行的所有测试中,有些测试是GPT-3或ChatGPT已经表现得很好的。
There were some tests that GPT-3 or ChatGPT was not as good at, that GPT-4 was much better at.
有些测试是GPT-3或ChatGPT表现不太好,但GPT-4表现得更好。
And there were some tests that neither are good at yet.
还有一些测试两者都还没有做好。
I would love for it, you know, and some of it has to do with reasoning.
我希望能够理解,这其中有些与推理有关。
It seems that, maybe in calculus, it wasn’t able to break the problem down into reasonable steps and solve it.
似乎在微积分中,它无法将问题分解为合理的步骤并加以解决。
But yet, in some areas, it seems to demonstrate reasoning skills.
但在某些领域,它似乎展示了推理能力。
And so is that an area that, in predicting the next word, you're learning reasoning?
那么,在预测下一个词时,是否就在学习推理?
And what are the limitations now of GPT-4 that would enhance its ability to reason even further?
GPT-4目前在推理能力上有哪些限制,如何才能进一步提升?
Ilya Sutskever:
You know, reasoning isn’t this super well-defined concept, but we can try to define it anyway, which is when you maybe go further, where you're able to somehow think about it a little bit and get a better answer because of your reasoning.
伊利亚·苏茨克维尔:
推理并不是一个非常明确定义的概念,但我们仍然可以尝试定义一下,也许可以这样说:推理是当你能够更深入思考时,凭借推理得出更好的答案。
And I’d say that our neural nets, you know, maybe there is some kind of limitation which could be addressed by, for example, asking the neural network to think out loud.
我想说,我们的神经网络可能确实有一些限制,可以通过让神经网络“思考出声”来解决。
“Neural network to think out loud” 的意思是指让神经网络在执行任务时,将其内部的推理过程或思考步骤显式地展示出来,而不是直接输出最终结果。这样可以更清楚地了解模型如何得出结论,也能帮助提高其多步推理能力。
This has proven to be extremely effective for reasoning.
这种方法已被证明对提升推理能力非常有效。
But I think it also remains to be seen just how far the basic neural network will go.
但我认为,基本神经网络能达到什么程度仍需观察。
I think we have yet to fully tap out its potential.
我认为我们还没有完全挖掘出它的潜力。
But yeah, I mean, there is definitely some sense where reasoning is still not quite at that level as some of the other capabilities of the neural network, though we would like the reasoning capabilities of the neural network to be higher.
不过,是的,在某种程度上,推理能力确实还没有达到神经网络某些其他能力的水平,尽管我们希望神经网络的推理能力能更高。
I think that it’s fairly likely that business as usual will improve the reasoning capabilities of the neural network.
我认为,按照常规研究方式,神经网络的推理能力很可能会得到提升。
I wouldn’t necessarily confidently rule out this possibility.
我不会轻易排除这种可能性。
Jensen Huang:
Yeah, because one of the things that is really cool is you ask ChatGPT a question. But before it answers the question, you can say, "Tell me first what you know, and then answer the question."
黄仁勋:
是的,有一件事很酷,你可以问ChatGPT一个问题,但在它回答之前,你可以说:“先告诉我你知道什么,然后再回答问题。”
You know, usually when somebody answers a question, if you give me the foundational knowledge that you have, or the foundational assumptions that you're making before you answer the question, that really improves my believability of the answer.
通常,当有人回答问题时,如果你先告诉我你掌握的基础知识或你做出的基础假设,这会大大增强我对答案的可信度。
You're also demonstrating some level of reasoning when you're demonstrating reasoning. And so it seems to me that ChatGPT has this inherent capability embedded in it.
在展示推理能力时,你也表现出了一定程度的推理。因此,在我看来,ChatGPT似乎内嵌了这种能力。
Ilya Sutskever:
To some degree. The one way to think about what's happening now is that these neural networks have a lot of these capabilities. They're just not quite very reliable.
伊利亚·苏茨克维尔:
在某种程度上是这样的。可以这样理解,现在的神经网络确实具备很多能力,但它们还不够可靠。
In fact, you could say that reliability is currently the single biggest obstacle for these neural networks being useful, truly useful.
事实上,可以说,可靠性是目前这些神经网络变得真正有用的最大障碍。
If sometimes it is still the case that I think these neural networks hallucinate a little bit, or maybe make some mistakes which are unexpected, which you wouldn't expect a person to make.
有时候,这些神经网络仍然会出现一些幻觉现象,或者犯一些人类不会犯的意外错误。
It is this kind of unreliability that makes them substantially less useful.
正是这种不可靠性使它们的实用性大打折扣。
But I think that perhaps with a little bit more research, with the current ideas that we have, and perhaps a few more of the ambitious research plans, we'll be able to achieve higher reliability as well.
但我认为,也许通过更多的研究,利用我们当前的想法,再加上一些雄心勃勃的研究计划,我们也许能够实现更高的可靠性。
And that will be truly useful. That will allow us to have very accurate guardrails, which are very precise.
那将是非常有用的。这将使我们能够设置非常精确的防护措施。
And it will make it ask for clarification where it's unsure. Or maybe say that it doesn't know something when it doesn't know and do so extremely reliably.
它会在不确定的情况下主动寻求澄清,或者在不知道某些信息时坦诚地说出来,并且做到极为可靠。
So I'd say that these are some of the bottlenecks really. So it's not about whether it exhibits some particular capability, but more how reliable exactly.
所以我认为,这些确实是一些瓶颈问题。关键不是它是否具备某种特定能力,而是其能力的可靠性究竟如何。
Jensen Huang:
Yeah, you know, speaking of factualness and factfulness, hallucination—I saw in one of the videos a demonstration that links to a Wikipedia page.
黄仁勋:
是的,说到事实性和准确性,以及幻觉现象——我在某个视频中看到一个演示,它链接到了维基百科页面。
Does retrieval capability, has that been included in GPT-4? Is it able to retrieve information from a factual place that could augment its response to you?
检索功能是否已包含在GPT-4中?它是否能够从事实来源检索信息来增强对你的回答?
Ilya Sutskever:
So the current GPT-4 as released does not have a built-in retrieval capability.
伊利亚·苏茨克维尔:
目前发布的GPT-4并没有内置检索功能。
It is just a really, really good next-word predictor, which can also consume images, by the way.
它只是一个非常强大的下一个词预测器,顺便说一下,它也可以处理图像。
We haven't spoken about it, but it is really good at images, which is also then fine-tuned with data and various reinforcement learning variants to behave in a particular way.
我们还没有提到这一点,但它在图像处理方面也非常出色,同时通过数据和各种强化学习变体进行了微调,以表现出特定的行为。
It is perhaps, I'm sure someone will, it wouldn't surprise me if some of the people who have access could perhaps request GPT-4 to maybe make some queries and then populate the results inside the context.
我相信某些拥有权限的人可能会让GPT-4进行一些查询,然后将结果填充到上下文中,这一点并不意外。
Because also the context duration of GPT-4 is quite a bit longer now.
因为GPT-4的上下文长度现在也大大增加了。
So in short, although GPT-4 does not support built-in retrieval, it is completely correct that it will get better with retrieval.
总之,尽管GPT-4不支持内置检索功能,但可以肯定的是,通过检索功能,它的表现会变得更好。
Jensen Huang:
Multi-modality. GPT-4 has the ability to learn from text and images and respond to input from text and images.
黄仁勋:
多模态。GPT-4具有从文本和图像中学习并对文本和图像输入做出响应的能力。
First of all, the foundation of multi-modality learning—of course, Transformers has made it possible for us to learn from multimodality, tokenize text and images.
首先,多模态学习的基础——当然,Transformer使我们能够从多模态中学习,将文本和图像标记化。
But at the foundational level, help us understand how multimodality enhances the understanding of the world beyond text by itself.
但从基础层面来看,帮助我们理解多模态如何增强对超越文本本身的世界的理解。
And my understanding is that when you do multimodality learning, even when it is just a text prompt, the text understanding could actually be enhanced.
我的理解是,当你进行多模态学习时,即使只是一个文本提示,文本理解实际上也可能得到增强。
Tell us about multimodality at the foundation, why it's so important, and what was the major breakthrough and the characteristic differences as a result?
请告诉我们多模态的基础,为什么它如此重要,以及由此带来的主要突破和特性差异?
Ilya Sutskever:
So there are two dimensions to multimodality, two reasons why it is interesting.
伊利亚·苏茨克维尔:
多模态有两个维度,两个使其有趣的原因。
The first reason is a little bit humble. The first reason is that multimodality is useful.
第一个原因相对朴素。第一个原因是多模态非常有用。
It is useful for a neural network to see vision in particular, because the world is very visual.
让神经网络能够“看”是有用的,特别是视觉,因为世界本质上非常视觉化。
Human beings are very visual animals. I believe that a third of the visual core of the human cortex is dedicated to vision.
人类是非常依赖视觉的动物。我相信人类大脑皮层的三分之一视觉核心专门处理视觉信息。
And so by not having vision, the usefulness of our neural networks, though still considerable, is not as big as it could be.
因此,如果没有视觉能力,尽管我们的神经网络仍然相当有用,但它的潜力无法完全发挥。
So it is a very simple usefulness argument. It is simply useful to see. And GPT-4 can see quite well.
所以,这就是一个简单的实用性论点。拥有视觉能力很有用。而GPT-4在“看”方面表现相当好。
There is a second reason to do vision, which is that we learn more about the world by learning from images in addition to learning from text.
第二个原因是,通过从图像中学习,我们可以比仅仅通过文本学习更深入地了解世界。
That is also a powerful argument, though it is not as clear-cut as it may seem.
这也是一个强有力的论点,尽管它看起来并不像表面上那么明确。
I'll give you an example. Or rather, before giving an example, I'll make the general comment.
我举个例子。或者在举例之前,我先做一个总体说明。
For a human being, us human beings, we get to hear about 1 billion words in our entire life.
对于我们人类来说,我们一生中大约听到10亿个词汇。
Jensen Huang:
Only?
黄仁勋:
只有这么多?
Ilya Sutskever:
Only 1 billion words.
伊利亚·苏茨克维尔:
只有10亿个词汇。
Jensen Huang:
That's amazing. Yeah.
黄仁勋:
这真是太神奇了,是的。
Ilya Sutskever:
That's not a lot.
伊利亚·苏茨克维尔:
但这并不多。
Jensen Huang:
Yeah, that's not a lot.
黄仁勋:
是的,这确实不多。
Ilya Sutskever:
So we need to compensate.
伊利亚·苏茨克维尔:
所以我们需要补偿。
Jensen Huang:
Does that include my own words in my own head?
黄仁勋:
这包括我脑海中的自言自语吗?
Ilya Sutskever:
Make it two billion, if you see what I mean. You know, we can see that because a billion seconds is 30 years.
伊利亚·苏茨克维尔:
如果你明白我的意思,可以算作20亿词。你知道,我们可以从时间来推算:10亿秒等于30年。
So you can kind of see like we don't get to see more than a few words a second and then we are asleep half the time.
所以你可以看到,我们每秒只能接触到几个词,而我们有一半时间都在睡觉。
So like a couple billion words is the total we get in our entire life.
因此,我们一生中接触到的总词汇量大概就是几亿到十亿词。
So it becomes really important for us to get as many sources of information as we can.
因此,获取尽可能多的信息来源对我们来说变得非常重要。
And we absolutely learn a lot more from vision.
而且,通过视觉我们确实能学到更多。
The same argument holds true for our neural networks as well, except for the fact that the neural network can learn from so many words.
同样的道理也适用于神经网络,不同之处在于神经网络可以从大量的词汇中学习。
So, things which are hard to learn about the world from text in a few billion words may become easier from trillions of words.
因此,那些通过几十亿词的文本很难学到的世界知识,在面对数万亿词时就可能变得容易得多。
And I'll give you an example. Consider colors. Surely, one needs to see to understand colors.
我举个例子,考虑颜色。显然,要理解颜色是需要“看”的。
And yet, the text-only neural networks have never seen a single photon in their entire life.
然而,仅靠文本的神经网络一辈子都没见过一个光子。
If you ask them which colors are more similar to each other, it will know that red is more similar to orange than to blue.
如果你问它们哪种颜色彼此更相似,它会知道红色比蓝色更接近橙色。
It will know that blue is more similar to purple than to yellow.
它会知道蓝色比黄色更接近紫色。
How does that happen?
这怎么可能呢?
And one answer is that information about the world, even the visual information, slowly leaks in through text, but slowly, not as quickly.
一个答案是,关于世界的信息,包括视觉信息,慢慢地通过文本渗透进来,但很慢,不像视觉学习那么快。
But when you have a lot of text, you can still learn a lot.
但当你拥有大量文本时,仍然可以学到很多东西。
Of course, once you also add vision and learning about the world from vision, you will learn additional things which are not captured in text.
当然,一旦你增加了视觉并通过视觉学习世界,你会学到一些文本中未能捕捉到的额外信息。
But I would not say that it is a binary. There are things which are impossible to learn from text only.
但我不会说这是非此即彼的情况。有些东西仅靠文本是无法学到的。
I think there's more of an exchange rate.
我认为这更像是一个“交换比率”。
And in particular, if you are like a human being and you want to learn from a billion words or a hundred million words, then of course the other sources of information become far more important.
尤其是,如果你像人类一样,仅能从十亿词或一亿词中学习,那么其他信息来源就变得更加重要。
Jensen Huang:
Yeah, so you learn from images. Is there a sensibility that would suggest that if we wanted to understand also the construction of the world—like, you know, the arm is connected to my shoulder, my elbow is connected, and somehow these things move, the animation of the world, the physics of the world—if I wanted to learn that as well, can I just watch videos and learn that?
黄仁勋:
是的,所以你可以通过图像学习。那么是否有一种可能性表明,如果我们也想了解世界的结构,比如手臂连接到肩膀,肘部连接,并且这些东西如何运动,世界的物理特性,如果我想了解这些,我能通过观看视频学到吗?
Ilya Sutskever:
Yes.
伊利亚·苏茨克维尔:
是的。
Jensen Huang:
And if I wanted to augment all of that with sound, like, for example, if somebody said the meaning of "great," "great" could mean "great," or "great" could mean "great," you know.
黄仁勋:
如果我想通过声音进一步增强学习,比如有人说“great”时,它可以是“很棒”的意思,也可以是“讽刺”的意思,你懂的。
So, one is sarcastic, one is enthusiastic. There are many, many words like that, you know. "That's sick," or "I'm sick"—depending on how people say it.
所以,一个是讽刺,一个是热情。有很多类似的词,比如“That's sick”,可以是“太酷了”,也可以是“我生病了”,这取决于人们怎么说。
Would audio also make a contribution to the learning of the model? And can we put that to good use soon?
声音是否也会对模型的学习有所贡献?我们能否很快将其加以利用?
Ilya Sutskever:
Yes. Yeah, I think it's definitely the case that, well, you know, what can we say about audio? It's useful.
伊利亚·苏茨克维尔:
是的,我认为确实如此。关于音频,我们可以说的是:它很有用。
It's an additional source of information, probably not as much as images or video.
音频是一个额外的信息来源,虽然可能不像图像或视频那样重要。
But there is a case to be made for the usefulness of audio as well, both on the recognition side and on the production side.
但音频在识别和生成方面同样有其独特的价值。
Jensen Huang:
When you—on the context of the scores that I saw—the thing that was really interesting was the data that you guys published, which showed which tests were performed well by GPT-3 and which ones performed substantially better with GPT-4.
黄仁勋:
在我看到的成绩中,有趣的是你们发布的数据,显示了GPT-3在哪些测试中表现良好,而GPT-4在哪些测试中显著提升了表现。
How did multimodality contribute to those tests, do you think?
你认为多模态在这些测试中的贡献是什么?
Ilya Sutskever:
Oh, I mean, in a pretty straightforward way. Anytime there was a test where, to understand the problem, you need to look at a diagram.
伊利亚·苏茨克维尔:
哦,这其实很直接。只要是需要通过查看图表来理解问题的测试,多模态就显得非常重要。
Like, for example, in some math competitions—like there’s a math competition for high school students called AMC 12. There, presumably, many of the problems have a diagram.
比如在一些数学竞赛中,有一个面向高中生的数学竞赛叫AMC 12。其中许多问题都有图表。
So GPT-3.5 does quite badly on the test. GPT-4 with text only does—I think, I don’t remember—but it’s like maybe from 2% to 20% success rate.
因此,GPT-3.5在这类测试中表现不佳。仅使用文本的GPT-4表现稍好,我记得大概从2%到20%的成功率。
But then, when you add vision, it jumps to a 40% success rate. So the vision is really doing a lot of work.
但当加入视觉后,成功率跃升至40%。因此,视觉功能确实起到了很大的作用。
The vision is extremely good. And I think being able to reason visually as well and communicate visually will also be very powerful and very nice, which go beyond just learning about the world.
视觉表现极其出色。我认为能够进行视觉推理和视觉交流将非常强大且有意义,超越了仅仅从文本中学习世界的范畴。
You have several things you can learn about the world. You can then reason about the world visually and communicate visually.
通过多模态学习,你可以学到更多关于世界的知识,随后还能进行视觉推理并通过视觉进行交流。
Where now, in the future perhaps, in some future version, if you ask your neural net, "Hey, explain this to me," rather than just producing four paragraphs, it will produce, "Hey, here's a little diagram which clearly conveys to you exactly what you need to know," and so on.
未来可能会有这样的版本:当你问神经网络“嘿,给我解释一下这个问题”时,它不仅会生成四段文字,还会生成一个小图表,清晰地向你传达你需要知道的内容,等等。
Jensen Huang:
That's incredible. You know, one of the things that you said earlier about an AI generating tests to train another AI—you know, there was a paper that was written about, and I don’t completely know whether it's factual or not,
黄仁勋:
这真是不可思议。你之前提到AI生成测试来训练另一个AI。我看到过一篇文章,说的是——我不完全确定这是否属实——
but that there's a total amount of somewhere between $4 trillion to something like $20 trillion useful tokens and language tokens that the world will be able to train on, you know, over some period of time,
但文章提到,全球可以用来训练的有用语言标记总量在4万亿到20万亿之间,这在未来一段时间内可以供训练使用。
and that we're going to run out of tokens to train. And, well, first of all, I wonder if you feel the same way.
文章还说,我们将面临训练数据不足的问题。首先,我想问问你是否也有同样的感觉?
And then, secondarily, whether the AI generating its own data could be used to train the AI itself, which you could argue is a little circular,
其次,AI是否可以生成自己的数据来训练自己,尽管这可能有些循环依赖。
but we train our brain with generated data all the time by self-reflection, working through a problem in our brain, you know, and, or,
但我们人类的大脑一直通过自我反思、在脑海中解决问题来“生成数据”进行训练。
I guess neuroscientists suggest that while sleeping, we do a fair amount of developing our neurons.
神经科学家还认为,我们在睡眠时会进行大量神经元的发育。
How do you see this area of synthetic data generation? Is that going to be an important part of the future of training AI and the AI teaching itself?
你如何看待合成数据生成这一领域?这会成为未来训练AI以及AI自我学习的重要部分吗?
Ilya Sutskever:
Well, I wouldn’t underestimate the data that exists out there. I think there’s probably more data than people realize.
伊利亚·苏茨克维尔:
嗯,我不会低估现有数据的量。我认为数据量可能比人们想象的要多得多。
And as to your second question, certainly a possibility remains to be seen.
至于你的第二个问题,这确实是一个可能性,还有待进一步观察。
Jensen Huang:
Yeah. Yeah, it really does seem that one of these days our AIs are, you know, when we're not using it, maybe generating either adversarial content for itself to learn from or imagine solving problems that it can go off and then improve itself.
黄仁勋:
是的,是的,确实看起来未来某一天,我们的AI可能在闲置时生成对抗性内容来自我学习,或者设想解决问题,从而自我提升。
Tell us, whatever you can, about where we are now and what do you think we'll be in the not-too-distant future—pick your horizon, a year or two.
请告诉我们,不论你能分享什么,你觉得我们现在所处的位置,以及在不远的未来,比如一年或两年后,会达到什么水平?
What do you think this whole language model area would be and some of the areas that you're most excited about?
你认为整个语言模型领域将会如何发展?哪些领域是你最感兴趣的?
Ilya Sutskever:
You know, predictions are hard. And although it's a little difficult to say things which are too specific, I think it's safe to assume that progress will continue and that we will keep on seeing systems which astound us in the things that they can do.
伊利亚·苏茨克维尔:
预测是很困难的。虽然很难具体说出什么,但我认为可以放心地假设,技术进步将持续下去,我们将不断看到这些系统在能力上让我们惊叹。
And the current frontiers will be centered around reliability, around the system being trusted—really getting to a point where we can trust what it produces,
当前的研究前沿将集中在可靠性和系统的可信性上——真正达到一个我们可以信任其生成内容的程度。
really getting to a point where if it doesn't understand something, it asks for clarification, says that it doesn't know something, or says that it needs more information.
真正达到这样一种状态:如果它不理解某件事,它会寻求澄清;如果它不知道某些内容,它会直接说出来,或者表示需要更多信息。
I think those are perhaps the areas where improvement will lead to the biggest impact on the usefulness of these systems. Because right now, that's really what stands in the way.
我认为这些可能是改进会对系统实用性产生最大影响的领域。因为现在,这些正是阻碍其更广泛应用的主要障碍。
You have an AI asking a neural net to summarize a long document, and you get a summary. But are you sure that some important detail wasn’t omitted?
比如,你让AI对一篇长文档进行总结,结果你得到一个摘要。但你能确定没有遗漏某些重要细节吗?
It’s still a useful summary, but it’s a different story when you know that all the important points have been covered.
这仍然是一个有用的摘要,但如果你知道所有重要点都被涵盖了,那将是完全不同的体验。
At some point—and in particular, it’s okay if there is ambiguity, it’s fine.
在某些情况下,尤其是当存在模糊性时,这没有问题。
But if a point is clearly important, such that anyone else who saw that point would say, “This is really important,” then the neural network will also recognize that reliably. That’s when you know.
但如果某个点显然很重要,以至于其他任何看到这个点的人都会说“这非常重要”,那么神经网络也能够可靠地识别出这一点。那时你才会真正放心。
Same for the guardrail. Same for its ability to clearly follow the intent of the user, of its operator.
防护措施也是如此。同样适用于其清晰遵循用户或操作者意图的能力。
So I think we’ll see a lot of that in the next two years.
因此,我认为在接下来的两年中,我们将在这方面看到许多进展。
Jensen Huang:
Yeah, that’s terrific, because the progress in those two areas will make this technology trusted by people to use and be able to apply for so many things.
黄仁勋:
是的,这太棒了,因为这两个领域的进展将使这项技术被人们信任,并能够应用于许多领域。
I was thinking that was going to be the last question, but I did have another one.
我本以为那是最后一个问题,但我还有一个问题。
Ilya Sutskever:
Sorry about that.
伊利亚·苏茨克维尔:
抱歉,我的错。
Jensen Huang:
So ChatGPT to GPT-4. GPT-4, when you first started using it, what are some of the skills that it demonstrated that surprised even you?
黄仁勋:
从ChatGPT到GPT-4。当你刚开始使用GPT-4时,它展示了哪些技能甚至让你感到惊讶?
Ilya Sutskever:
Well. There were lots of really cool things that it demonstrated, which were quite cool and surprising. It was quite good. So I'll mention two examples.
伊利亚·苏茨克维尔:
嗯,它展示了许多非常棒且令人惊讶的能力。确实很优秀。我来举两个例子。
So let's see. I'm just trying to think about the best way to go about it.
让我想想,我试着找个最好的方式来说。
The short answer is that the level of its reliability was surprising.
简单来说,它的可靠性水平让我感到惊讶。
Where the previous neural networks, if you ask them a question, sometimes they might misunderstand something in a kind of a silly way,
以前的神经网络,有时候当你问它们问题时,它们可能会以一种愚蠢的方式误解问题。
where the GPT-4—that stopped happening. Its ability to solve math problems became far greater.
而GPT-4则不再出现这种情况。它解决数学问题的能力大大提高。
You could really do the derivation and long, complicated derivation, convert the units, and so on. And that was really cool.
它可以真正进行推导,完成冗长且复杂的推导,转换单位等等。这非常棒。
You know, like many people...
你知道,像许多人一样……
Jensen Huang:
It works through a proof. It's pretty amazing.
黄仁勋:
它能够完成一个证明。这太惊人了。
Ilya Sutskever:
Not all proofs, naturally, but quite a few.
伊利亚·苏茨克维尔:
当然,并不是所有的证明,但已经相当多了。
Or another example would be, like many people noticed that it has the ability to produce poems with, you know, every word starting with the same letter or every word starting with some...
另一个例子是,很多人注意到它可以创作诗歌,比如每个词都以相同的字母开头,或者每个词以某个特定字母开头……
Jensen Huang:
It follows instructions really, really clearly.
黄仁勋:
它非常清晰地遵循指令。
Ilya Sutskever:
Not perfectly still, but much better than before.
伊利亚·苏茨克维尔:
虽然还不算完美,但比以前好得多。
Jensen Huang:
Yeah, really good.
黄仁勋:
是的,非常棒。
Ilya Sutskever:
And on the vision side, I really love how it can explain jokes, can explain memes. You show it a meme and ask it why it's funny, and it will tell you, and it will be correct.
伊利亚·苏茨克维尔:
在视觉方面,我真的很喜欢它能够解释笑话、解释梗图。你给它看一张梗图,问它为什么好笑,它会告诉你,而且是正确的。
The vision part, I think, is very—it’s like really actually seeing it when you can ask follow-up questions about some complicated image with a complicated diagram and get an explanation. That’s really cool.
我认为视觉部分非常有趣——就像真正“看见”了一样。你可以针对一些复杂的图像和图表提出后续问题,并得到解释。这非常酷。
But yeah, overall, I will say, to take a step back, you know, I’ve been in this business for quite some time, actually, like almost exactly 20 years.
不过,总的来说,我想说,回顾一下,我已经在这个领域工作了相当长的时间,实际上差不多刚好20年。
And the thing which I find most surprising is that it actually works.
让我感到最惊讶的是,它竟然真的有效。
It turned out to be the same little thing all along, which is no longer little and a lot more serious and much more intense.
结果发现,始终是同样的基本原理,只不过它现在不再是“小”问题,而是更严肃、更强大的东西。
But it’s the same neural network, just larger, trained on maybe larger datasets in different ways with the same fundamental training algorithm.
但它依然是同样的神经网络,只是规模更大,使用更大的数据集,通过相同的基本训练算法进行训练。
So it’s like, wow, I would say this is what I find the most surprising.
所以我会说,这就是让我感到最惊讶的地方。
Whenever I take a step back, I go, how is it possible that those ideas, those conceptual ideas about, well, the brain has neurons, so maybe artificial neurons are just as good.
每当我退一步思考时,我会想,这些想法是怎么可能的——大脑有神经元,也许人工神经元也一样好。
And so maybe we just need to train them somehow with some learning algorithm, that those arguments turned out to be so incredibly correct. That would be the biggest surprise, I’d say.
所以我们只需要用某种学习算法来训练它们,这些论点竟然如此正确。这是我认为最大的惊喜。
Jensen Huang:
In the 10 years that we’ve known each other, the models that you’ve trained and the amount of data you’ve trained from what you did on AlexNet to now is about a million times.
黄仁勋:
在我们认识的这10年里,从你在AlexNet上所做的工作到现在,你训练的模型规模和数据量增加了大约一百万倍。
And no one in the world of computer science would have believed that the amount of computation that was done in that 10 years' time would be a million times larger and that you dedicated your career to go do that.
在计算机科学界,没有人会相信在这10年内计算量会增长一百万倍,而你将整个职业生涯都投入到这个领域。
You’ve done many more. Your body of work is incredible, but two seminal works: the co-invention with AlexNet in that early work, and now with GPT at OpenAI.
你做了很多工作。你的成果令人惊叹,但有两项开创性的研究:早期与AlexNet的共同发明,以及现在OpenAI的GPT。
It is truly remarkable what you’ve accomplished.
你取得的成就是非凡的。
It’s great to catch up with you again, Ilya, my good friend. It is quite an amazing moment.
能再次见到你真是太好了,伊利亚,我的好朋友。这是一个令人惊叹的时刻。
And today’s talk, the way you break down the problem and describe it, this is one of the best beyond-PhD descriptions of the state of the art of large language models. I really appreciate that.
今天的对话中,你分析和描述问题的方式,是对大型语言模型现状最好的“超博士级”描述之一。我非常感谢。
It’s great to see you. Congratulations.
很高兴见到你,祝贺你。
Ilya Sutskever:
Thank you so much.
伊利亚·苏茨克维尔:
非常感谢。
Jensen Huang:
Yeah, thank you.
黄仁勋:
是的,谢谢你。
Ilya Sutskever:
I had so much fun.
伊利亚·苏茨克维尔:
我非常享受这次对话。
Jensen Huang:
Thank you.
黄仁勋:
谢谢你。