2024-12-14 Ilya Sutskever.NeurIPS 2024 Test of Time Award

Refer To：《Ilya Sutskever: Sequence to sequence learning with neural networks | NeurIPS 2024 Test of Time Award》。

Ilya Sutskever:

I want to thank the organizers for choosing a paper for this award. It was very nice. And I also want to thank my incredible co-authors and collaborators, Oriol Vignells and Kwok Lee, who stood right before you a moment ago.

伊利亚·苏茨凯弗：

我想感谢主办方选择我们的论文获得这个奖项。这非常令人高兴。同时，我还要感谢我令人难以置信的合作者和合著者，奥里奥尔·维涅尔斯和郭克·李，他们刚刚在你们面前发言过。

And what you have here is an image, a screenshot, from a similar talk ten years ago at NeurIPS in 2014 in Montreal. And it was a much more innocent time. Here we are shown in the photos. This is the before. Here's the after, by the way.

你们现在看到的是一张图片，是2014年在蒙特利尔举办的NeurIPS会议上一次类似演讲的截图。那是一个更加纯真的时代。照片中展示了当时的我们。这是之前的样子，顺便提一下，这是现在的样子。

And now we've got my experienced, hopefully, visor. But here I'd like to talk a little bit about the work itself and maybe a 10-year retrospective on it. Because a lot of the things in this work were correct, but some not so much.

现在，我带着一些经验和希望能指导的视角。但在这里，我想谈谈这项工作的本身，并可能对它进行十年的回顾。因为这项工作中的很多东西是正确的，但也有一些并不完全如此。

And we can review them and we can see what happened and how it gently flowed to where we are today. So let's begin by talking about what we did. And the way we'll do it is by showing slides from the same talk 10 years ago.

我们可以回顾这些内容，看看发生了什么，以及它是如何平稳地发展到今天的。这么说吧，我们从谈论我们所做的开始。我们的方法是展示十年前同一个演讲中的幻灯片。

But the summary of what we did is the following three bullet points. It's an autoregressive model trained on text, it's a large neural network, and it's a large data set. And that's it. Now let's dive in into the details a little bit more.

我们工作的总结可以概括为以下三点：这是一个基于文本训练的自回归模型、它是一个大型神经网络、以及使用了一个大数据集。就这些。现在让我们深入一些细节。

So this was a slide 10 years ago. Not too bad. The Deep Learning Hypothesis. And what we said here is that if you have a large neural network with 10 layers, then it can do anything that a human being can do in a fraction of a second.

这是十年前的一张幻灯片，不算太差。关于深度学习假设。我们当时说的是，如果你有一个由10层组成的大型神经网络，它可以完成任何人类在不到一秒的时间里能完成的任务。

Why did we have this emphasis on things that human beings can do in a fraction of a second? Why this thing specifically? Well, if you believe the deep learning dogma, so to say, that artificial neurons and biological neurons are similar,

为什么我们特别强调人类在不到一秒时间内可以完成的事情？为什么特别关注这个？好吧，如果你相信深度学习的理论，换句话说，相信人工神经元和生物神经元是相似的，

or at least not too different, and you believe that real neurons are slow, then anything that we can do quickly, by we I mean human beings, I even mean just one human in the entire world.

或者至少并没有太大差异，并且你相信真实的神经元运行较慢，那么我们能够快速完成的任何事情，“我们”指的是人类，我甚至是指整个世界上任何一个人类。

If there is one human in the entire world that can do some task in a fraction of a second, then a 10-layer neural network can do it too, right? It follows.

如果全世界有一个人类能在不到一秒内完成某个任务，那么一个十层的神经网络也可以做到，对吧？这就是推论的结果。

You just take their connections and you embed them inside your neural net, the artificial one. So this was the motivation. Anything that a human being can do in a fraction of a second, a big 10-layer neural network can do too.

你只需将它们的连接嵌入到你的人工神经网络中。这就是我们的动机。任何人类在不到一秒的时间内能完成的事情，一个大型的十层神经网络也可以完成。

We focused on 10-layer neural networks because this was the neural networks we knew how to train back in the day. If you could go beyond in your layers somehow, then you could do more.

我们专注于十层神经网络，因为在那个时候，这正是我们知道如何训练的网络。如果你能以某种方式突破层数，那么你可以完成更多事情。

But back then, we could only do 10 layers, which is why we emphasized whatever human beings can do in a fraction of a second. A different slide from the talk, a slide which says our main idea.

但在当时，我们只能做到十层，这就是为什么我们强调人类在不到一秒时间内能完成的事情。接下来是演讲中的另一张幻灯片，展示了我们的主要想法。

And you may be able to recognize two things or at least one thing. You might be able to recognize that something autoregressive is going on here. What is it saying really? What does this slide really say?

你可能能够识别出两点，或者至少一点。你可能会注意到这里有某种自回归的内容。它到底在说什么？这张幻灯片真正表达的是什么？

This slide says that if you have an autoregressive model, and it predicts the next token well enough, then it will in fact grab and capture and grasp the correct distribution over sequences that come next.

这张幻灯片表示，如果你有一个自回归模型，并且它能很好地预测下一个标记，那么它实际上能够抓住、捕捉并掌握后续序列的正确分布。

And this was a relatively new thing. It wasn't literally the first ever autoregressive neural network.

这在当时是一个相对较新的概念。它并不是第一个自回归神经网络。

But I would argue it was the first autoregressive neural network where we really believed that if you train it really well, then you will get whatever you want.

但我会说，这是第一个让我们真正相信，只要你训练得够好，就能得到任何你想要的结果的自回归神经网络。

In our case, back then, was the humble, today humble, then incredibly audacious task of translation. Now I'm going to show you some ancient history that many of you may have never seen before. It's called the LSTM.

当时我们的目标是翻译，这在今天看来可能是一个不起眼的任务，但在当时却是极其大胆的。现在我要展示一些许多人可能从未见过的古老历史，它叫做LSTM。

To those unfamiliar, an LSTM is the things that poor deep learning researchers did before transformers. And it's basically a ResNet but rotated 90 degrees. So that's an LSTM.

对于不熟悉的人来说，LSTM是那些在Transformer出现之前的深度学习研究者的选择。它基本上就是一个旋转90度的ResNet。所以，这就是LSTM。

And it came before, it's like, it's like kind of like a slightly more complicated ResNet. You can see there is your integrator, which is now called the residual stream, but you've got some multiplication going on.

它比Transformer更早出现，有点像一个稍微复杂的ResNet。你可以看到其中的“积分器”，现在被称为残差流，同时还有一些乘法运算在进行。

It's a little bit more complicated, but that's what we did. It was a ResNet rotated 90 degrees. Another cool feature from that old talk that I want to highlight is that we used parallelization. But not just any parallelization.

它稍微复杂一些，但这就是我们当时的做法。它是一个旋转了90度的ResNet。我还想强调一下那次演讲中提到的另一个很酷的特性：我们使用了并行化。但不仅仅是普通的并行化。

We used pipelining as witnessed by this one layer per GPU. Was it wise to pipeline? As we now know, pipelining is not wise. But we were not as wise back then. So we used that and we got a 3.5x speedup using eight GPUs.

我们使用了流水线处理，每个GPU负责一层。现在来看，使用流水线处理是否明智？我们知道这并不明智。但当时我们并没有那么有智慧。所以我们用了这种方法，并通过八个GPU实现了3.5倍的速度提升。

And the conclusion slide in some sense, the conclusion slide from the talk from back then, is the most important slide because it spelled out what could arguably be the beginning of the scaling hypothesis, right?

某种意义上来说，那次演讲的结论幻灯片是最重要的一张，因为它勾勒出了所谓的“扩展假设”的开端，对吧？

That if you have a very big data set and you train a very big neural network, then success is guaranteed. And one can argue, if one is charitable, that this indeed has been what's been happening. I want to mention one other idea.

如果你有一个非常大的数据集，并训练一个非常大的神经网络，那么成功是可以保证的。如果宽容地说，这确实是已经发生的事情。我还想提到另一个想法。

And this is, I claim, the idea that truly stood the test of time. It's the core idea of deep learning itself. It's the idea of connectionism.

我认为，这是一个真正经受住时间考验的想法。它是深度学习本身的核心理念——连接主义的理念。

It's the idea that if you allow yourself to believe that an artificial neuron is kind of, sort of, like a biological neuron. Right. If you believe that one is kind of, sort of, like the other.

这一理念认为，如果你允许自己相信人工神经元与生物神经元有些相似，没错。如果你相信它们在某种程度上是类似的。

Then it gives you the confidence to believe that very large neural networks, they don't need to be literally human brain scale, they might be a little bit smaller, but you could configure them to do pretty much all the things that's we do, human beings.

那么这就赋予了你相信的信心，认为非常大的神经网络，即使不需要达到人类大脑的规模，也许稍小一些，但可以配置它们去完成我们人类能做的几乎所有事情。

There's still a difference. Oh, I forgot to that end. There is still a difference because the human brain also figures out how to reconfigure itself.

但仍然有差异。哦，我差点忘了提到这一点。确实存在差异，因为人类大脑还能找到重新配置自己的方法。

Whereas we are using the best learning algorithms that we have, which require as many data points as there are parameters. Human beings are still better in this regard. But what this led, so I claim, arguably, is to the age of pre-training.

而我们使用的是现有的最优学习算法，它们需要的数据点与参数数量相当。在这方面，人类仍然更胜一筹。但我认为，这引导我们进入了所谓的“预训练时代”。

And the age of pre-training is what we might say the GPT-2 model, the GPT-3 model, the scaling laws, and I want to specifically call out my former collaborators Alec Radford, also Jared Kaplan, Dario Amodei, for really making this work.

而“预训练时代”可以说是从GPT-2模型、GPT-3模型以及扩展定律开始的。我特别想提到我的前同事们，Alec Radford、Jared Kaplan和Dario Amodei，正是他们真正推动了这项工作的实现。

But that led to the age of pre-training and this is what's been the driver of all of progress, all the progress that we see today. Extra-large neural networks. Extraordinarily large neural networks trained on huge data sets.

这引领了预训练时代，并推动了所有的进步，今天我们看到的所有进步。超大规模神经网络，在海量数据集上训练的异常庞大的神经网络。

But pre-training as we know it will unquestionably end. Pre-training will end. Why will it end?

但我们所了解的预训练无疑会终结。预训练将会终结。为什么会终结？

Because while compute is growing through better hardware, better algorithms, and larger clusters, right, all those things keep increasing your compute. All these things keep increasing your compute.

因为虽然计算能力通过更好的硬件、更优的算法和更大的集群在不断增长，对吧？所有这些因素都在不断提升你的计算能力。所有这些都在提升你的计算能力。

The data is not growing because we have but one internet. We have but one intranet. You could even say, you can even go as far as to say that data is the fossil fuel of AI.

数据并没有增长，因为我们只有一个互联网，也只有一个内联网。你甚至可以说，甚至可以大胆地说，数据是人工智能的化石燃料。

It was like created somehow and now we use it and we've achieved peak data and there'll be no more. We have to deal with the data that we have. Now it still let us go quite far, but this is There's only one intranet.

它以某种方式被创造出来，而现在我们正在使用它。我们已经达到了数据的峰值，而不会有更多数据了。我们必须处理已有的数据。虽然这些数据仍然能让我们走得很远，但这就是现状：只有一个内联网。

So here, I'll take a bit of liberty to speculate about what comes next. Actually, I don't need to speculate because many people are speculating too. And I'll mention their speculations. You may have heard the phrase agents. It's common.

因此，在这里我会稍微自由地推测一下接下来会发生什么。实际上，我不需要过多猜测，因为许多人也在猜测。我会提到他们的猜测。你可能听过“智能代理”这个词，这是一个常见的概念。

And I'm sure that eventually something will happen. But people feel like something agents is the future. More concretely, but also a little bit vaguely, synthetic data. But what does synthetic data mean? Figuring this out is a big challenge.

我确信最终会有一些事情发生。但大家觉得智能代理可能代表未来。更具体一些，但仍然有些模糊的，是合成数据。但合成数据到底意味着什么？弄清楚这一点是一个巨大的挑战。

And I'm sure that different people have all kinds of interesting progress there. And an inference time compute, or maybe what's been most recently most vividly seen in O1, the O1 model, these are all examples of things,

我相信不同的人在这方面会有各种有趣的进展。而在推理时间计算上，或者最近在O1模型中最生动地看到的内容，这些都是一些例子，

of people trying to figure out what to do after pre-training. And those are all very good things to do. I want to mention one other example from biology, which I think is really cool. And the example is this.

人们试图弄清楚预训练之后该做些什么。这些都是非常好的探索方向。我想提到一个来自生物学的例子，我认为这非常酷。这个例子是这样的：

So about many, many years ago, at this conference also, I saw a talk where someone presented this graph. But the graph showed the relationship between the size of the body of a The size of the body of a mammal and the size of their brain.

许多年前，也是在这个会议上，我听过一个演讲，其中有人展示了这样一张图表。这张图表展示了哺乳动物身体大小和大脑大小之间的关系。

In this case, it's in mass. And that talk, I remember vividly, they were saying, look, it's in biology, everything is so messy.

这里的大小是以质量为单位的。我清楚地记得，当时他们在说，看，在生物学中，一切都是如此混乱。

But here you have one rare example where there is a very tight relationship between the size of the body of the animal and their brain. And totally randomly, I became curious at this graph.

但在这里，你有一个罕见的例子，展示了动物身体大小和大脑大小之间非常紧密的关系。完全是偶然的，我对这张图产生了兴趣。

And one of the early, one of the early, so I went to Google to do research to look for this graph. And one of the images in Google Images was this. And the interesting thing in this image is you see like, I don't know, is the mouse working?

于是，在早期的某个时候，我去Google上搜索，试图找到这张图。在Google图片中，有一张图引起了我的兴趣。而这张图有趣的地方是，你可以看到，我不知道鼠标是否在起作用？

Oh yeah, the mouse is working, great. So you've got this mammals, right? All the different mammals. Then you've got non-human primates. It's basically the same thing. But then you've got the hominids.

哦，鼠标可以工作，很好。所以你可以看到这些哺乳动物，对吧？所有不同的哺乳动物。然后你有了非人类灵长类动物。基本上也是相同的趋势。但接着是人科动物。

And to my knowledge, hominids are like close relatives to the humans in evolution. Like the Neanderthals, there's a bunch of them. It's called Homo habilis maybe. There's a whole bunch, and they're all here.

据我所知，人科动物是人类在进化中的近亲。比如尼安德特人，还有一大批，可能被称为直立人（Homo habilis）。有很多这样的物种，它们全都包含在这里。

And what's interesting is that they have a different slope on their brain-to-body scaling exponent. So that's pretty cool.

有趣的是，它们在脑容量与身体大小的比例上有一个不同的斜率。这非常有意思。

What that means is that there is a precedent, there is an example of biology figuring out some kind of different scaling. Something clearly is different. So I think that is cool. And by the way, I want to highlight this x-axis is log scale.

这意味着生物学中存在一个先例，有一个例子表明生物在某种程度上找到了不同的扩展方式。这显然是不同的。所以我认为这非常酷。顺便提一下，我想强调一下，这个x轴是对数尺度。

You see this is a hundred, this is a thousand, ten thousand, a hundred thousand, and likewise in grams. One gram, ten grams, hundred grams, thousand grams. So it is possible for things to be different.

你可以看到，这是100，这是1,000，这是10,000，这是100,000，单位是克。同样的，这里是一克、十克、一百克、一千克。所以事情确实可能是不同的。

The things that we are doing, the things that we've been scaling so far, is actually the first thing that we figured out how to scale. The field, everyone who's working here will figure out what to do.

我们目前所做的事情，我们一直在扩展的内容，其实是我们第一次找到的可以扩展的方向。这个领域中，每一个从业者都会找到该怎么做。

But I want to talk here, I want to take a few minutes and speculate about the longer term. The longer term, where are we all headed? Right, we're making all this progress. It's astounding progress.

但我想在这里花几分钟时间，推测一下更长远的未来。更长远的未来，我们到底要走向哪里？我们正在取得这些进展。令人惊叹的进展。

It's really, I mean, those of you who've been in the field 10 years ago and you remember just how incapable everything has been, like, yes, you can say, even if you kind of say, of course, deep learning,

确实，我的意思是，那些十年前就在这个领域的人，你们还记得当时的一切有多么无力。就像是，是的，你可以说，即使你认为深度学习当然如此，

still to see it is just unbelievable. It's completely, I can't convey that feeling to you.

但亲眼看到这一切的发生，仍然难以置信。我无法向你们传达那种感受。

You know, if you joined the field in the last two years, then of course you speak to computers and they talk back to you and they disagree and that's what computers are. But it hasn't always been the case.

你知道，如果你是过去两年才加入这个领域的人，那么当然，你可以和计算机对话，它们会回应你，甚至与你产生分歧，这就是计算机的表现。但它并不一直如此。

But I want to talk to you a little bit about super-intelligence. Just a bit. Because that is obviously where this field is headed. This is obviously what's being built here.

但我想和你们谈谈超级智能。稍微谈一点点。因为很显然，这个领域正朝着这个方向前进。这显然就是我们正在建构的东西。

And the thing about super-intelligence is that it will be different qualitatively from what we have.

关于超级智能，它在质量上会与我们现在拥有的东西完全不同。

And my goal in the next minute to try to give you some concrete intuition of how it will be different so that you yourself could reason about it. So right now, we have our incredible language models and their unbelievable chatbots.

我接下来的目标是在一分钟内，尝试给你们一些具体的直觉，说明它将如何不同，以便你们自己能够推理。现在，我们拥有令人难以置信的语言模型和非凡的聊天机器人。

And they can even do things. But they're also kind of strangely unreliable. And they get confused while also having dramatically superhuman performance on evals. So it's really unclear how to reconcile this.

它们甚至可以完成一些任务。但它们也有点奇怪地不可靠。在评估中表现出极其超越人类的能力的同时，也会陷入混乱。所以如何调和这一点，目前还不清楚。

But eventually, sooner or later, the following will be achieved. Those systems are actually going to be agentic in real ways. Whereas right now, the systems are not agents in any meaningful sense. Just very, that might be too strong.

但最终，迟早会实现以下目标。这些系统实际上将以真正的方式具有代理性。而目前，这些系统在任何有意义的层面上都不是代理，只是非常——这可能有点过于夸张了。

They're very, very slightly agentic. Just beginning. It will actually reason. And by the way, I want to mention something about reasoning, is that a system that reasons, the more it reasons, the more unpredictable it becomes.

它们的代理性非常非常有限，只是刚刚起步。它们将真正进行推理。顺便提一下，我想谈谈关于推理的一点：一个进行推理的系统，推理得越多，其行为就越不可预测。

The more it reasons, the more unpredictable it becomes. All the deep learning that we've been used to is very predictable because if you've been working on replicating human intuition, essentially, it's like the gut feel.

推理得越多，越不可预测。我们习惯的所有深度学习是非常可预测的，因为它一直在试图复制人类的直觉，本质上类似于本能的感觉。

If you come back to the 0.1 second reaction time, what kind of processing we do in our brains, well, it's our intuition. So we've endowed our AIs with some of that intuition.

如果你回到0.1秒的反应时间，思考我们的大脑在做什么样的处理，这实际上是我们的直觉。所以我们赋予了人工智能某种程度的直觉。

But reasoning, and you're seeing some early signs of that, reasoning is unpredictable. And one reason to see that is because the chess AIs, the really good ones, are unpredictable to the best human chess players.

但推理，你们已经看到了一些早期迹象，推理是不可预测的。其中一个例子是，顶尖的人类棋手也无法预测顶尖的国际象棋AI的行为。

So, we will have to be dealing with AI systems that are incredibly unpredictable. They will understand things from limited data. They will not get confused, all the things which are really big limitations.

所以，我们将不得不面对极其不可预测的AI系统。它们将从有限的数据中理解事物，它们不会困惑，这些都是非常大的局限性。

I'm not saying how, by the way, and I'm not saying when. I'm saying that it will. And when all those things will happen together with self-awareness, because why not? Self-awareness is useful.

顺便说一句，我并没有说明如何实现，也没有说明何时实现。我只是说它将会发生。而当这些事情与自我意识一同出现时，因为为什么不呢？自我意识是有用的。

It is part, you are ourselves, are parts of our own world models. When all those things come together, we will have systems of radically different qualities and properties that exist today.

这是一个部分，我们——人类自身——也是我们自己世界模型的一部分。当所有这些结合在一起时，我们将拥有与今天完全不同性质和属性的系统。

And of course, they will have incredible and amazing capabilities. But the kind of issues that come up with systems like this, and I'll just leave it as an exercise just to imagine, it's very different from what we're used to.

当然，它们将拥有令人难以置信和惊人的能力。但这种系统所带来的问题，我会留给你们作为思考练习来想象，它与我们习惯的完全不同。

And I would say that it's definitely also impossible to predict the future. Really, all kinds of stuff is possible. But on this uplifting note, I will conclude. Thank you so much.

我还想说，预测未来绝对是不可能的。真的，各种事情都有可能发生。但以这样一个充满希望的语调，我将结束演讲。非常感谢大家！

Unknown Speaker:

Thank you.

未知发言者：

谢谢。

Speaker 2:

Now in 2024, are there other biological structures that are part of human cognition that you think are worth exploring in a similar way or that you're interested in anyway?

发言者2：

现在是2024年，您认为是否还有其他与人类认知相关的生物结构值得以类似方式进行探索，或者您感兴趣的呢？

Ilya Sutskever:

So, the way I'd answer this question is that if you are or someone is a person who has a specific insight about, hey, we are all being extremely silly because clearly the brain does something and we are not,

伊利亚·苏茨凯弗：

我会这样回答这个问题：如果你，或者某人，对某件事情有具体的洞察，比如，嘿，我们都太愚蠢了，因为显然大脑在做一些事情，而我们却没有，

and that's something that can be done, they should pursue it. I personally don't—well, it depends on the level of abstraction you're looking at. Maybe I'll answer it this way.

而且这是一件可以完成的事情，那么他们应该去追求。我个人并没有——嗯，这取决于你所关注的抽象层次。也许我会这样回答。

Like, there's been a lot of desire to make biologically inspired AI. And you could argue on some level that biologically inspired AI is incredibly successful, which is all of deep learning is biologically inspired AI.

就像人们一直希望构建受生物启发的AI。你可以在某种程度上认为，受生物启发的AI是非常成功的，因为整个深度学习就是受生物启发的AI。

But on the other hand, the biological inspiration was very, very, very modest. It's like, let's use neurons. This is the full extent of the biological inspiration. Let's use neurons.

但另一方面，这种生物启发实际上非常、非常、非常有限。它仅仅是，“我们来用神经元吧”。这就是生物启发的全部程度——我们来使用神经元。

And more detailed biological inspiration has been very hard to come by, but I wouldn't rule it out. I think if someone has a special insight, they might be able to see something and that would be useful.

更为详细的生物启发一直很难获得，但我不会排除这种可能性。我认为，如果有人有特殊的洞察力，他们可能会看到一些东西，而这将是有用的。

Speaker 3:

I have a question for you about sort of autocorrect. So here is, here's the question. You mentioned reasoning as being one of the core aspects of maybe the modeling in the future and maybe a differentiator.

发言者3：

我有一个关于自动纠错的问题。问题是这样的：您提到推理可能是未来建模的核心方面之一，也可能是一个差异化因素。

What we saw in some of the poster sessions is that hallucinations in today's models, the way we're analyzing, I mean maybe you correct me, you're the expert on this,

在一些海报展示中，我们看到今天的模型存在幻觉现象。我们分析的方式——也许您可以纠正我，您是这方面的专家，

but the way we're analyzing whether a model is hallucinating today without, because we know of the dangers of models not being able to reason,

我们分析模型是否出现幻觉的方式——因为我们知道模型缺乏推理能力的危险，

that we're using a statistical analysis, let's say some amount of standard deviations or whatever away from the mean.

我们用的是统计分析，例如偏离均值一定数量的标准差或其他方法。

In the future, do you think that a model given reasoning will be able to correct itself, sort of auto-correct itself,

在未来，您是否认为赋予推理能力的模型能够自我纠正，某种意义上实现自动纠错，

and that will be a core feature of future models so that there won't be as many hallucinations because the model will recognize when,

并且这将成为未来模型的核心功能，从而减少幻觉现象，因为模型能够识别出什么时候——

maybe that's too esoteric of a question, but the model will be able to reason and understand when a hallucination is occurring? Does the question make sense?

也许这个问题过于深奥，但模型能够推理并理解幻觉何时发生？这个问题有意义吗？

Ilya Sutskever:

Yes, and the answer is also yes. I think what you described is extremely highly plausible. I mean, you should check. I wouldn't rule out that it might already be happening with some of the early reasoning models of today. I don't know.

伊利亚·苏茨凯弗：

是的，答案也是肯定的。我认为您所描述的情况极其有可能实现。我的意思是，您应该去验证一下。我不排除这可能已经在今天的一些早期推理模型中发生了。我不确定。

But longer term, why not?

但从长远来看，为什么不可能呢？

Speaker 3:

Yeah, I mean, it's part of Microsoft Word, like autocorrect. It's a core feature.

发言者3：

是啊，这就像微软Word里的自动纠错功能一样，是一个核心功能。

Ilya Sutskever:

Yeah, I just, I mean, I think calling it autocorrect is really doing a disservice. I think you are, when you say autocorrect, you evoke like, it's far grander than autocorrect, but this point aside, the answer is yes.

伊利亚·苏茨凯弗：

是的，我只是觉得，把它称作自动纠错有点贬低了它的意义。我觉得当你说自动纠错时，你唤起了一种错觉，它远比自动纠错要宏大得多。不过抛开这一点，答案是肯定的。

Speaker 3:

Thank you.

发言者3：

谢谢。

Speaker 2:

Hi Ilya, I loved the ending, mysteriously leaving out, do they replace us or are they, you know, superior? Do they need rights? You know, it's a new species of homo sapien spawned intelligence.

发言者2：

嗨，伊利亚，我很喜欢你的结尾，神秘地避开了它们是否会取代我们，或者它们是否优越的问题。它们是否需要权利？它们是一种由智力衍生的新型智人物种。

So maybe they need, I mean, I think the RL guy thinks they think, you know, we need rights for these things. I have an unrelated question to that.

所以也许它们需要，我的意思是，我认为强化学习的研究者可能认为我们需要为这些存在赋予权利。我还有一个不相关的问题。

How do you create the right incentive mechanisms for humanity to actually create it in a way that gives it the freedoms that we have as homo sapiens?

如何创造正确的激励机制，让人类以一种方式创造这些智能，使它们拥有与我们智人一样的自由？

Ilya Sutskever:

You know, I feel like this, in some sense, those are the kind of questions that people should be reflecting on more. But to your question about what incentive structure should we create, I don't feel that I know.

伊利亚·苏茨凯弗：

你知道，从某种意义上说，这些是人们应该更多反思的问题。但对于你关于应该创造什么样的激励结构的问题，我觉得我并不知道答案。

I don't feel confident answering questions like this because it's like you're talking about creating some kind of a top-down structured government thing, I don't know.

我对回答这样的问题没有信心，因为这听起来像是在讨论创建某种自上而下的结构化政府之类的东西，我不确定。

Speaker 2:

It could be a cryptocurrency, too.

发言者2：

它也可以是某种加密货币。

Ilya Sutskever:

Yeah, I mean...

伊利亚·苏茨凯弗：

是的，我的意思是……

Speaker 2:

There's BitTensor, you know, there's things.

发言者2：

比如BitTensor，还有其他一些东西。

Ilya Sutskever:

I don't feel like I am the right person to comment on cryptocurrency, but... You know, there is a chance, by the way, what you're describing will happen, that indeed we will have, you know,

伊利亚·苏茨凯弗：

我觉得我不是评论加密货币的合适人选，不过……顺便说一下，你描述的情况有可能发生。确实，我们可能会拥有……

in some sense it's not a bad end result if you have AIs and all they want is to coexist with us and also just to have rights. Maybe that will be fine. But I don't know. I mean, I think things are so incredibly unpredictable.

从某种意义上说，如果人工智能只是想与我们共存并拥有权利，这并不是一个糟糕的结果。也许这样会很好。但我不知道。我觉得事情是如此不可预测。

I hesitate to comment, but I encourage the speculation.

我不愿轻易评论，但我鼓励这种猜想。

Speaker 2:

Thank you. And yeah, thank you for the talk. It's really awesome. Hi, Ilya. Thank you for the great talk. My name is Shalev Lifshitz from the University of Toronto. I'm working with Sheila. Thanks for all the work you've done.

发言者2：

谢谢。是的，非常感谢您的演讲。这真的很棒。嗨，伊利亚，谢谢您精彩的演讲。我叫沙列夫·利夫希茨，来自多伦多大学。我和希拉合作。感谢您所做的所有工作。

I wanted to ask, do you think LLMs generalize multi-hop reasoning out of distribution?

我想问一下，您认为大型语言模型（LLMs）能否在分布外泛化多跳推理？

Ilya Sutskever:

So, okay, the question assumes that the answer is yes or no, but the question should not be answered with yes or no. Because what does it mean auto-distribution generalization? What does it mean? What does it mean in distribution?

伊利亚·苏茨凯弗：

嗯，好吧，这个问题假设答案是“是”或“否”，但这个问题不应该用“是”或“否”来回答。因为什么是分布外泛化？这是什么意思？什么是分布内？

And what does it mean auto-distribution? Because it's a test of time talk, I'll say that long, long ago, before people were using deep learning, they were using things like string matching, n-grams.

那么，什么是分布外？因为这是一个时间的考验式话题，我会说，很久以前，在人们使用深度学习之前，他们使用的是字符串匹配、n-gram之类的东西。

For machine translation, people were using statistical phrase tables. Can you imagine? They had tens of thousands of code of complexity which was, I mean, it was truly unfathomable.

在机器翻译中，人们使用的是统计短语表。你能想象吗？他们有成千上万行复杂的代码，我的意思是，这真的难以想象。

And back then, generalization meant, is it literally not in the same phrasing as in the data set? Now we may say, well, sure, my model achieves this high score on, I don't know, math competitions.

在那时，泛化的意思是，它是否真的没有使用数据集中相同的表述？现在我们可能会说，好吧，我的模型在某些数学竞赛中取得了很高的分数。

But maybe the math, maybe some discussion in some forum on the internet was about the same ideas, and therefore it's memorized. Well, OK, you could say maybe it's in distribution, maybe it's memorization.

但也许这些数学题，或者某些网上论坛中的讨论涉及了相同的概念，因此被模型记住了。那么，你可以说，也许这是分布内，也许是记忆化的结果。

But I also think that our standards for what counts as generalization have increased really quite substantially, dramatically, unimaginably if you keep track.

但我也认为，对于什么算作泛化，我们的标准已经大幅度、戏剧性地提高了，甚至超出了想象，如果你有持续关注的话。

So I think the answer is, to some degree, probably not as well as human beings. I think it is true that human beings generalize much better.

所以我认为答案是，从某种程度上来说，可能不如人类做得好。我认为人类的确在泛化方面要优秀得多。

But at the same time, they definitely generalize out of distribution to some degree. I hope it's a useful, tautological answer.

但与此同时，模型确实在某种程度上可以进行分布外泛化。我希望这是一个有用的、但略显重复的答案。

Speaker 2:

Thank you.

发言者2：

谢谢。

Unknown Speaker:

And unfortunately, we're out of time for this session. I have a feeling we could go on for the next six hours. But thank you so much for the talk.

未知发言者：

遗憾的是，这个环节的时间到了。我感觉我们还可以继续聊六个小时。非常感谢您的演讲。

Ilya Sutskever:

Thank you.

伊利亚·苏茨凯弗：

谢谢。

2024-12-14 Ilya Sutskever.NeurIPS 2024 Test of Time Award

2024-12-14 Ilya Sutskever.NeurIPS 2024 Test of Time Award

热门主题

Related Articles

2020-05-08 Ilya Sutskever.Deep Learning

2023-03-27 Ilya Sutskever.Why Next-Token Prediction Could Surpass Human Intelligence

2023-03 Ilya Sutskever.AI Today and Vision of the Future

2018-04-09 OpenAI.Charter

2023-10-15 Jensen Huang.ACQUIRED Interview with NVIDIA CEO Jensen Huang