Time to AGI 时间通往通用人工智能
Dwarkesh Patel
德瓦凯什·帕特尔
Today I have the pleasure of interviewing Ilya Sutskever, who is the Co-founder and Chief Scientist of OpenAI. Ilya, welcome to The Lunar Society.
今天,我很高兴采访OpenAI的联合创始人兼首席科学家伊利亚·苏茨克维尔。伊利亚,欢迎来到《月球社会》。
Ilya Sutskever
伊利亚·苏茨克维尔
Thank you, happy to be here.
谢谢,很高兴来到这里。
Dwarkesh Patel
德瓦凯什·帕特尔
First question and no humility allowed. There are not that many scientists who will make a big breakthrough in their field, there are far fewer scientists who will make multiple independent breakthroughs that define their field throughout their career, what is the difference? What distinguishes you from other researchers? Why have you been able to make multiple breakthroughs in your field?
第一个问题,不允许谦虚。能够在自己的领域取得重大突破的科学家并不多,而能在整个职业生涯中实现多次独立突破并定义该领域的科学家就更少了。这其中的差别是什么?是什么让你与其他研究人员不同?你为什么能够在你的领域内多次取得突破?
Ilya Sutskever
伊利亚·苏茨克维尔
Thank you for the kind words. It's hard to answer that question. I try really hard, I give it everything I've got and that has worked so far. I think that's all there is to it.
谢谢你的夸奖。这个问题很难回答。我非常努力,全力以赴,而这到目前为止是有效的。我认为就是这么简单。
Dwarkesh Patel
德瓦凯什·帕特尔
Got it. What's the explanation for why there aren't more illicit uses of GPT? Why aren't more foreign governments using it to spread propaganda or scam grandmothers?
明白了。为什么GPT的非法用途没有更多?为什么更多的外国政府没有用它来传播宣传或诈骗老人?
Ilya Sutskever
伊利亚·苏茨克维尔
Maybe they haven't really gotten to do it a lot. But it also wouldn't surprise me if some of it was going on right now. I can certainly imagine they would be taking some of the open-source models and trying to use them for that purpose. For sure I would expect this to be something they'd be interested in the future.
也许他们还没有真的大量使用它。但如果现在确实有这种情况发生,我也不会感到惊讶。我完全可以想象他们会利用一些开源模型来实现这种目的。我确信未来他们会对这个感兴趣。
Dwarkesh Patel
德瓦凯什·帕特尔
It's technically possible they just haven't thought about it enough?
从技术上讲,是不是可能他们只是没有对此想得足够多?
Ilya Sutskever
伊利亚·苏茨克维尔
Or haven't done it at scale using their technology. Or maybe it is happening, which is annoying.
或者他们还没有用自己的技术大规模实施。也可能确实正在发生,这就令人恼火了。
Dwarkesh Patel
德瓦凯什·帕特尔
Would you be able to track it if it was happening?
如果这真的发生了,你能追踪到吗?
Ilya Sutskever
伊利亚·苏茨克维尔
I think large-scale tracking is possible, yes. It requires special operations but it's possible.
我认为大规模追踪是可能的。需要一些特殊操作,但确实可以做到。
Dwarkesh Patel
德瓦凯什·帕特尔
Now there's some window in which AI is very economically valuable, let’s say on the scale of airplanes, but we haven't reached AGI yet. How big is that window?
现在存在一个时间窗口,在这个窗口中,人工智能具有很高的经济价值,比如说达到飞机的规模,但我们还没有达到通用人工智能。这个窗口有多大?
Ilya Sutskever
伊利亚·苏茨克维尔
It's hard to give a precise answer and it’s definitely going to be a good multi-year window. It's also a question of definition. Because AI, before it becomes AGI, is going to be increasingly more valuable year after year in an exponential way.
很难给出确切的答案,但肯定会是一个持续多年的窗口。这也是一个定义的问题。因为在成为通用人工智能之前,人工智能的价值每年都会以指数级的方式增长。
In hindsight, it may feel like there was only one year or two years because those two years were larger than the previous years. But I would say that already, last year, there has been a fair amount of economic value produced by AI. Next year is going to be larger and larger after that. So I think it's going to be a good multi-year chunk of time where that’s going to be true, from now till AGI pretty much.
从后视角来看,可能会觉得这个窗口只有一两年,因为那两年的增长幅度远大于前几年。但我会说,其实去年AI已经创造了相当大的经济价值。明年会更大,之后会越来越大。所以我认为,这将是一个持续多年的时间段,从现在起直到通用人工智能基本成型。
Dwarkesh Patel
德瓦凯什·帕特尔
Okay. Because I'm curious if there's a startup that's using your model, at some point if you have AGI there's only one business in the world, it's OpenAI. How much window does any business have where they're actually producing something that AGI can’t produce?
好的。我很好奇,如果有一家初创公司在使用你们的模型,一旦你们达到了通用人工智能,世界上唯一的业务就只有OpenAI。那么,企业在生产通用人工智能无法生产的东西方面还有多长时间窗口?
Ilya Sutskever
伊利亚·苏茨克维尔
It's the same question as asking how long until AGI. It's a hard question to answer. I hesitate to give you a number. Also because there is this effect where optimistic people who are working on the technology tend to underestimate the time it takes to get there. But the way I ground myself is by thinking about the self-driving car. In particular, there is an analogy where if you look at the size of a Tesla, and if you look at its self-driving behavior, it looks like it does everything. But it's also clear that there is still a long way to go in terms of reliability. And we might be in a similar place with respect to our models where it also looks like we can do everything, and at the same time, we will need to do some more work until we really iron out all the issues and make it really good and really reliable and robust and well behaved.
这个问题和“距离通用人工智能还有多远”是一样的,确实很难回答。我不太愿意给出一个具体的数字。因为有一种现象是,乐观的技术从业者往往会低估达到目标所需的时间。但我用来让自己保持客观的方法是思考自动驾驶汽车的进展。特别是,有一个类比:如果你看看特斯拉的规模,再看看它的自动驾驶行为,它看起来几乎能做所有事情。但也很明显,它在可靠性方面还有很长的路要走。而我们的模型可能也处于类似的阶段,表面上看似乎能做所有事情,但同时,我们还需要进一步完善,解决所有问题,让它变得真正优秀、可靠、健壮且行为得当。
Dwarkesh Patel
德瓦凯什·帕特尔
By 2030, what percent of GDP is AI?
到2030年,人工智能将占GDP的百分比是多少?
Ilya Sutskever
伊利亚·苏茨克维尔
Oh gosh, very hard to answer that question.
哦天,这个问题很难回答。
Dwarkesh Patel
德瓦凯什·帕特尔
Give me an over-under.
给我一个大致的范围。
Ilya Sutskever
伊利亚·苏茨克维尔
The problem is that my error bars are in log scale. I could imagine a huge percentage, I could imagine a really disappointing small percentage at the same time.
问题是我的误差范围是以对数尺度计算的。我既可以想象一个非常大的百分比,也可以想象一个令人失望的非常小的百分比。
Dwarkesh Patel
德瓦凯什·帕特尔
Okay, so let's take the counterfactual where it is a small percentage. Let's say it's 2030 and not that much economic value has been created by these LLMs. As unlikely as you think this might be, what would be your best explanation right now of why something like this might happen?
好的,我们来讨论一个反事实情景,假设人工智能大语言模型的经济价值占比很小。假设到了2030年,这些模型没有创造出太多经济价值。尽管你认为这种情况不太可能发生,但你现在对这种情况可能发生的最好解释是什么?
Ilya Sutskever
伊利亚·苏茨克维尔
I really don't think that's a likely possibility, that's the preface to the comment. But if I were to take the premise of your question, why were things disappointing in terms of real-world impact? My answer would be reliability. If it somehow ends up being the case that you really want them to be reliable and they ended up not being reliable, or if reliability turned out to be harder than we expect.
我真的不认为这种可能性很大,这是我评论的前提。但如果我接受你的假设,为什么在实际影响方面会令人失望?我的答案是可靠性。如果结果是你真的希望这些系统是可靠的,但它们却不够可靠,或者可靠性比我们预期的更难实现。
I really don't think that will be the case. But if I had to pick one and you were telling me — hey, why didn't things work out? It would be reliability. That you still have to look over the answers and double-check everything. That just really puts a damper on the economic value that can be produced by those systems.
我真的不认为会是这样。但如果我必须选一个原因,而你问我——嘿,为什么事情没有成功?那就是可靠性。你仍然需要检查答案并核实一切。这确实会大大降低这些系统能够产生的经济价值。
Dwarkesh Patel
德瓦凯什·帕特尔
Got it. They will be technologically mature, it’s just the question of whether they'll be reliable enough.
明白了。技术上会成熟,但问题在于它们是否足够可靠。
Ilya Sutskever
伊利亚·苏茨克维尔
Well, in some sense, not reliable means not technologically mature.
嗯,从某种意义上说,不可靠就意味着技术上还不够成熟。
What’s after generative models?
生成模型之后是什么?
Yeah, fair enough. What's after generative models? Before, you were working on reinforcement learning. Is this basically it? Is this the paradigm that gets us to AGI? Or is there something after this?
是的,有道理。生成模型之后是什么?之前你在研究强化学习。这基本就是它了吗?这是否是通向通用人工智能的范式?还是之后会有其他东西?
Ilya Sutskever
伊利亚·苏茨克维尔
I think this paradigm is gonna go really, really far and I would not underestimate it. It's quite likely that this exact paradigm is not quite going to be the AGI form factor. I hesitate to say precisely what the next paradigm will be but it will probably involve integration of all the different ideas that came in the past.
我认为这种范式会走得非常非常远,我不会低估它。这种范式很可能不会完全成为通用人工智能的最终形态。我不太愿意明确地说下一个范式是什么,但它可能会涉及整合过去出现的所有不同理念。
Dwarkesh Patel
德瓦凯什·帕特尔
Is there some specific one you're referring to?
你指的是具体哪一种?
Ilya Sutskever
伊利亚·苏茨克维尔
It's hard to be specific.
很难具体说明。
Dwarkesh Patel
德瓦凯什·帕特尔
So you could argue that next-token prediction can only help us match human performance and maybe not surpass it? What would it take to surpass human performance?
那么可以说,下一步预测只能帮助我们达到人类的表现,而无法超越?要超越人类表现需要什么条件?
Ilya Sutskever
伊利亚·苏茨克维尔
I challenge the claim that next-token prediction cannot surpass human performance. On the surface, it looks like it cannot. It looks like if you just learn to imitate, to predict what people do, it means that you can only copy people. But here is a counter argument for why it might not be quite so. If your base neural net is smart enough, you just ask it — What would a person with great insight, wisdom, and capability do? Maybe such a person doesn't exist, but there's a pretty good chance that the neural net will be able to extrapolate how such a person would behave. Do you see what I mean?
我质疑下一步预测无法超越人类表现的说法。从表面上看,确实似乎无法超越。看起来,如果你只是学会模仿,预测人类的行为,这意味着你只能复制人类。但这里有一个反驳的理由,为什么可能并非如此。如果你的基础神经网络足够智能,你可以问它——具有卓越洞察力、智慧和能力的人会怎么做?也许这样的人不存在,但神经网络很可能能够推测出这样一个人会如何表现。你明白我的意思吗?
Dwarkesh Patel
德瓦凯什·帕特尔
Yes, although where would it get that sort of insight about what that person would do? If not from…
是的,不过它从哪里获取这种关于这个人会怎么做的洞察力呢?如果不是从……
Ilya Sutskever
伊利亚·苏茨克维尔
From the data of regular people. Because if you think about it, what does it mean to predict the next token well enough? It's actually a much deeper question than it seems. Predicting the next token well means that you understand the underlying reality that led to the creation of that token. It's not statistics. Like it is statistics but what is statistics? In order to understand those statistics to compress them, you need to understand what is it about the world that creates this set of statistics? And so then you say — Well, I have all those people. What is it about people that creates their behaviors? Well they have thoughts and their feelings, and they have ideas, and they do things in certain ways. All of those could be deduced from next-token prediction. And I'd argue that this should make it possible, not indefinitely but to a pretty decent degree to say — Well, can you guess what you'd do if you took a person with this characteristic and that characteristic? Like such a person doesn't exist but because you're so good at predicting the next token, you should still be able to guess what that person who would do. This hypothetical, imaginary person with far greater mental ability than the rest of us.
从普通人的数据中获取。因为如果你仔细想,足够好地预测下一个词意味着什么?这实际上是一个比表面上看起来深刻得多的问题。预测下一个词得好,意味着你理解了导致这个词产生的基本现实。这不仅仅是统计。虽然这确实是统计,但统计的本质是什么?为了理解这些统计并压缩它们,你需要理解是什么关于世界的特性导致了这些统计数据的产生?于是你会说——好吧,我有所有这些人。是什么让人们产生这些行为?他们有思想、有感受、有想法,并以某种方式做事。这一切都可以从下一步预测中推导出来。我认为这应该使我们有可能,虽然不是无限可能,但到相当程度上可以说——好吧,如果你考虑一个具有这种特征和那种特征的人,你能猜到他会怎么做吗?这样的假设人可能不存在,但因为你对预测下一个词如此在行,你仍然应该能够猜到这样一个具有比我们其他人更强大心智能力的虚拟人会做什么。
Dwarkesh Patel
德瓦凯什·帕特尔
When we're doing reinforcement learning on these models, how long before most of the data for the reinforcement learning is coming from AI and not humans?
在对这些模型进行强化学习时,要多久大部分强化学习的数据来源于AI而不是人类?
Ilya Sutskever
伊利亚·苏茨克维尔
Already most of the default reinforcement learning is coming from AIs. The humans are being used to train the reward function. But then the reward function and its interaction with the model is automatic and all the data that's generated during the process of reinforcement learning is created by AI. If you look at the current technique/paradigm, which is getting some significant attention because of ChatGPT, Reinforcement Learning from Human Feedback (RLHF). The human feedback has been used to train the reward function and then the reward function is being used to create the data which trains the model.
实际上,大多数默认的强化学习数据已经来自AI。人类的作用是用来训练奖励函数。但奖励函数与模型的交互是自动的,强化学习过程中生成的所有数据都是由AI生成的。如果你看看当前的技术/范式(因为ChatGPT而受到关注),就是“基于人类反馈的强化学习”(RLHF)。人类反馈被用来训练奖励函数,然后奖励函数被用来生成训练模型的数据。
用的人越多性能越好。
Dwarkesh Patel
德瓦凯什·帕特尔
Got it. And is there any hope of just removing a human from the loop and have it improve itself in some sort of AlphaGo way?
明白了。那么是否有希望完全移除人类参与,让它像AlphaGo那样自行改进?
Ilya Sutskever
伊利亚·苏茨克维尔
Yeah, definitely. The thing you really want is for the human teachers that teach the AI to collaborate with an AI. You might want to think of it as being in a world where the human teachers do 1% of the work and the AI does 99% of the work. You don't want it to be 100% AI. But you do want it to be a human-machine collaboration, which teaches the next machine.
是的,绝对可能。真正理想的是,让教授AI的人类教师与AI协作。你可以将其想象为一个世界,人类教师只完成1%的工作,而AI完成99%的工作。你并不希望完全由AI独立完成,而是希望人机协作,共同训练下一代机器。
Dwarkesh Patel
德瓦凯什·帕特尔
I've had a chance to play around these models and they seem bad at multi-step reasoning. While they have been getting better, what does it take to really surpass that barrier?
我曾经体验过这些模型,它们似乎在多步推理方面表现不佳。虽然它们有所改善,但真正突破这一障碍需要什么?
Ilya Sutskever
伊利亚·苏茨克维尔
I think dedicated training will get us there. More and more improvements to the base models will get us there. But fundamentally I also don't feel like they're that bad at multi-step reasoning. I actually think that they are bad at mental multi-step reasoning when they are not allowed to think out loud. But when they are allowed to think out loud, they're quite good. And I expect this to improve significantly, both with better models and with special training.
我认为专门的训练可以实现这一目标。对基础模型的不断改进也会帮助我们达到目标。但从根本上说,我并不认为它们在多步推理方面表现得那么差。我实际上认为,它们在不被允许“出声思考”时才表现得不好。但当它们被允许“出声思考”时,表现相当不错。我预计,随着模型的改进和专门训练,这方面会有显著提升。
Data, models, and research 数据、模型和研究
Dwarkesh Patel
德瓦凯什·帕特尔
Are you running out of reasoning tokens on the internet? Are there enough of them?
你们在互联网上的推理词元用完了吗?这些词元还够用吗?
Ilya Sutskever
伊利亚·苏茨克维尔
So for context on this question, there are claims that at some point we will run out of tokens, in general, to train those models. And yeah, I think this will happen one day and by the time that happens, we need to have other ways of training models, other ways of productively improving their capabilities and sharpening their behavior, making sure they're doing exactly, precisely what you want, without more data.
关于这个问题,有一种说法是,总有一天我们会用完用于训练这些模型的词元。我确实认为这种情况会发生。而在那时,我们需要找到其他方法来训练模型,找到其他能够有效提升它们能力、优化它们行为的方法,确保它们在不需要更多数据的情况下,能够完全按照我们的要求执行。
Dwarkesh Patel
德瓦凯什·帕特尔
You haven't run out of data yet? There's more?
你们还没有用完数据?还有更多吗?
Ilya Sutskever
伊利亚·苏茨克维尔
Yeah, I would say the data situation is still quite good. There's still lots to go. But at some point the data will run out.
是的,我会说目前数据情况还是不错的。还有很多数据可用。但数据总有用完的一天。
Dwarkesh Patel
德瓦凯什·帕特尔
What is the most valuable source of data? Is it Reddit, Twitter, books? Where would you train many other tokens of other varieties for?
最有价值的数据来源是什么?是Reddit、Twitter还是书籍?你们会用哪些其他种类的数据来训练模型?
Ilya Sutskever
伊利亚·苏茨克维尔
Generally speaking, you'd like tokens which are speaking about smarter things, tokens which are more interesting. All the sources which you mentioned are valuable.
一般来说,你会更倾向于那些谈论更聪明内容的词元,以及那些更有趣的词元。你提到的所有数据来源都是有价值的。
Dwarkesh Patel
德瓦凯什·帕特尔
So maybe not Twitter. But do we need to go multimodal to get more tokens? Or do we still have enough text tokens left?
所以可能不包括Twitter。但我们是否需要转向多模态来获取更多词元?还是文本词元仍然足够?
Ilya Sutskever
伊利亚·苏茨克维尔
I think that you can still go very far in text only but going multimodal seems like a very fruitful direction.
我认为,仅仅依靠文本仍然可以走得很远,但转向多模态看起来是一个非常有前途的方向。
Dwarkesh Patel
德瓦凯什·帕特尔
If you're comfortable talking about this, where is the place where we haven't scraped the tokens yet?
如果你愿意谈谈,有哪些地方我们还没有抓取过词元?
Ilya Sutskever
伊利亚·苏茨克维尔
Obviously I can't answer that question for us but I'm sure that for everyone there is a different answer to that question.
显然,我不能代表我们回答这个问题,但我确信对于每个人来说,这个问题的答案都会不同。
Dwarkesh Patel
德瓦凯什·帕特尔
How many orders of magnitude improvement can we get, not from scale or not from data, but just from algorithmic improvements?
如果不依赖规模或数据,而仅仅依赖算法改进,我们可以获得多少数量级的提升?
Ilya Sutskever
伊利亚·苏茨克维尔
Hard to answer but I'm sure there is some.
很难回答,但我确信会有一些提升。
Dwarkesh Patel
德瓦凯什·帕特尔
Is some a lot or some a little?
这个“一些”是很多还是很少?
Ilya Sutskever
伊利亚·苏茨克维尔
There’s only one way to find out.
只有一种方法能知道答案。
Dwarkesh Patel
德瓦凯什·帕特尔
Okay. Let me get your quickfire opinions about these different research directions. Retrieval transformers. So it’s just somehow storing the data outside of the model itself and retrieving it somehow.
好的。我想快速听听你对这些不同研究方向的看法。检索式变压器——即将数据存储在模型之外,并以某种方式检索这些数据。
Ilya Sutskever
伊利亚·苏茨克维尔
Seems promising.
看起来很有前途。
Dwarkesh Patel
德瓦凯什·帕特尔
But do you see that as a path forward?
你认为这是一个前进的方向吗?
Ilya Sutskever
伊利亚·苏茨克维尔
It seems promising.
看起来很有前景。
Dwarkesh Patel
德瓦凯什·帕特尔
Robotics. Was it the right step for OpenAI to leave that behind?
机器人学。OpenAI放弃这一领域是正确的决定吗?
Ilya Sutskever
伊利亚·苏茨克维尔
Yeah, it was. Back then it really wasn't possible to continue working in robotics because there was so little data. Back then if you wanted to work on robotics, you needed to become a robotics company. You needed to have a really giant group of people working on building robots and maintaining them. And even then, if you’re gonna have 100 robots, it's a giant operation already, but you're not going to get that much data. So in a world where most of the progress comes from the combination of compute and data, there was no path to data on robotics. So back in the day, when we made a decision to stop working in robotics, there was no path forward.
是的,当时这是正确的决定。那时在机器人领域继续工作几乎不可能,因为数据非常少。当时,如果你想在机器人领域工作,就需要变成一家机器人公司。你需要拥有一大批人来制造和维护机器人。即便如此,如果你拥有100台机器人,这已经是一个庞大的操作了,但你依然无法获得足够多的数据。所以,在一个大多数进展都依赖于计算能力和数据结合的世界里,机器人领域没有数据之路。因此,当我们决定停止机器人工作时,那时确实没有前进的道路。
Dwarkesh Patel
德瓦凯什·帕特尔
Is there one now?
现在有路径了吗?
Ilya Sutskever
伊利亚·苏茨克维尔
I'd say that now it is possible to create a path forward. But one needs to really commit to the task of robotics. You really need to say — I'm going to build many thousands, tens of thousands, hundreds of thousands of robots, and somehow collect data from them and find a gradual path where the robots are doing something slightly more useful. And then the data that is obtained and used to train the models, and they do something that's slightly more useful. You could imagine it's this gradual path of improvement, where you build more robots, they do more things, you collect more data, and so on. But you really need to be committed to this path. If you say, I want to make robotics happen, that's what you need to do. I believe that there are companies who are doing exactly that. But you need to really love robots and need to be really willing to solve all the physical and logistical problems of dealing with them. It's not the same as software at all. I think one could make progress in robotics today, with enough motivation.
我会说,现在已经有可能开辟一条前进的道路。但这需要真正投入到机器人任务中。你需要明确表示——我要建造成千上万甚至数十万台机器人,并通过某种方式从中收集数据,找到一条渐进的路径,让机器人逐步执行更有用的任务。然后利用这些数据来训练模型,使机器人能够做出稍微更有用的事情。可以想象,这是一条逐步改进的路径:制造更多机器人,让它们做更多事情,收集更多数据,依此类推。但你必须真正致力于这条道路。如果你决定让机器人技术实现,这是你需要做的。我相信现在确实有公司在做这件事。但你需要真正热爱机器人,并且愿意解决与之相关的所有物理和后勤问题。这与软件完全不同。我认为只要有足够的动力,今天在机器人领域是可以取得进展的。
更加困难的商业模式。
Dwarkesh Patel
德瓦凯什·帕特尔
What ideas are you excited to try but you can't because they don't work well on current hardware?
有哪些让你兴奋但因为当前硬件无法很好地运行而无法尝试的想法?
Ilya Sutskever
伊利亚·苏茨克维尔
I don't think current hardware is a limitation. It's just not the case.
我不认为当前硬件是限制。这根本不是问题。
Dwarkesh Patel
德瓦凯什·帕特尔
Got it. But anything you want to try you can just spin it up?
明白了。那么你想尝试的任何东西都可以直接启动?
Ilya Sutskever
伊利亚·苏茨克维尔
Of course. You might wish that current hardware was cheaper or maybe it would be better if it had higher memory processing bandwidth let’s say. But by and large hardware is just not an issue.
当然。你可能希望当前的硬件更便宜,或者内存处理带宽更高。但总的来说,硬件并不是问题。
Alignment 对齐问题
Dwarkesh Patel
德瓦凯什·帕特尔
Let's talk about alignment. Do you think we'll ever have a mathematical definition of alignment?
我们来谈谈对齐问题。你认为我们会有对齐的数学定义吗?
Ilya Sutskever
伊利亚·苏茨克维尔
A mathematical definition is unlikely. Rather than achieving one mathematical definition, I think we will achieve multiple definitions that look at alignment from different aspects. And that this is how we will get the assurance that we want. By which I mean you can look at the behavior in various tests, congruence, in various adversarial stress situations, you can look at how the neural net operates from the inside. You have to look at several of these factors at the same time.
数学定义不太可能。与其实现一个数学定义,我认为我们会有多个定义,从不同方面来看待对齐问题。这是我们获得所需保证的方法。我的意思是,你可以从各种测试中的行为、一致性、各种对抗性压力环境中的表现来看,还可以观察神经网络的内部运作。你必须同时关注这些不同因素。
Dwarkesh Patel
德瓦凯什·帕特尔
And how sure do you have to be before you release a model in the wild? 100%? 95%?
在将模型投入实际应用之前,你们需要有多大的把握?100%?95%?
Ilya Sutskever
伊利亚·苏茨克维尔
Depends on how capable the model is. The more capable the model, the more confident we need to be.
这取决于模型的能力。模型越强大,我们需要的信心就越高。
Dwarkesh Patel
德瓦凯什·帕特尔
Alright, so let's say it's something that's almost AGI. Where is AGI?
好吧,假设它几乎是通用人工智能。AGI在哪里?
Ilya Sutskever
伊利亚·苏茨克维尔
Depends on what your AGI can do. Keep in mind that AGI is an ambiguous term. Your average college undergrad is an AGI, right? There's significant ambiguity in terms of what is meant by AGI. Depending on where you put this mark you need to be more or less confident.
这取决于你的AGI能做什么。请记住,AGI是一个模糊的术语。普通的大学本科生算不算AGI,对吧?AGI的定义有很大的模糊性。根据你设定的标准,需要的信心程度会有所不同。
Dwarkesh Patel
德瓦凯什·帕特尔
You mentioned a few of the paths toward alignment earlier, what is the one you think is most promising at this point?
你之前提到了几种实现对齐的路径,你认为哪一种目前最有前景?
Ilya Sutskever
伊利亚·苏茨克维尔
I think that it will be a combination. I really think that you will not want to have just one approach. People want to have a combination of approaches. Where you spend a lot of compute adversarially to find any mismatch between the behavior you want it to teach and the behavior that it exhibits. We look into the neural net using another neural net to understand how it operates on the inside. All of them will be necessary. Every approach like this reduces the probability of misalignment. And you also want to be in a world where your degree of alignment keeps increasing faster than the capability of the models.
我认为这将是一个组合。我真的认为你不想只依赖一种方法。大家希望采用多种方法的组合。在对抗性计算中投入大量资源,以发现模型表现出的行为与预期行为之间的任何不匹配。我们还可以使用另一个神经网络来研究该网络的内部运作。所有这些方法都是必要的。每种这样的方法都会降低对齐失败的可能性。同时,你希望生活在一个对齐程度增长速度快于模型能力增长速度的世界中。
Dwarkesh Patel
德瓦凯什·帕特尔
Do you think that the approaches we’ve taken to understand the model today will be applicable to the actual super-powerful models? Or how applicable will they be? Is it the same kind of thing that will work on them as well or?
你认为我们今天用来理解模型的方法对真正超强大的模型也适用吗?或者它们的适用性有多大?是否同样的方法也能在这些模型上奏效?
Ilya Sutskever
伊利亚·苏茨克维尔
It's not guaranteed. I would say that right now, our understanding of our models is still quite rudimentary. We’ve made some progress but much more progress is possible. And so I would expect that ultimately, the thing that will really succeed is when we will have a small neural net that is well understood that’s been given the task to study the behavior of a large neural net that is not understood, to verify.
这并不能保证。我会说,目前我们对模型的理解仍然相当初步。我们取得了一些进展,但还有更多进步的空间。因此,我预计,最终真正成功的方式将是我们能够有一个理解得非常透彻的小型神经网络,它的任务是研究那些我们尚未理解的大型神经网络的行为并进行验证。
Dwarkesh Patel
德瓦凯什·帕特尔
By what point is most of the AI research being done by AI?
到什么时候,大部分AI研究会由AI来完成?
Ilya Sutskever
伊利亚·苏茨克维尔
Today when you use Copilot, how do you divide it up? So I expect at some point you ask your descendant of ChatGPT, you say — Hey, I'm thinking about this and this. Can you suggest fruitful ideas I should try? And you would actually get fruitful ideas. I don't think that's gonna make it possible for you to solve problems you couldn't solve before.
今天,当你使用Copilot时,你如何划分工作?所以我预计,总有一天你会向ChatGPT的后代询问——嘿,我在思考这个和那个。你能提出一些有用的建议吗?而你确实会得到有用的建议。但我认为,这不会让你能够解决之前无法解决的问题。
Dwarkesh Patel
德瓦凯什·帕特尔
Got it. But it's somehow just telling the humans giving them ideas faster or something. It's not itself interacting with the research?
明白了。但它只是以某种方式更快地给人类提供想法,而不是直接参与研究,对吗?
Ilya Sutskever
伊利亚·苏茨克维尔
That was one example. You could slice it in a variety of ways. But the bottleneck there is good ideas, good insights and that's something that the neural nets could help us with.
这是一个例子。你可以用多种方式来划分。但这里的瓶颈是好的想法、好的洞察力,而这正是神经网络可以帮助我们的地方。
Dwarkesh Patel
德瓦凯什·帕特尔
If you're designing a billion-dollar prize for some sort of alignment research result or product, what is the concrete criterion you would set for that billion-dollar prize? Is there something that makes sense for such a prize?
如果你要为某种对齐研究成果或产品设计一项十亿美元的奖项,你会设定什么具体标准?这样的奖项有合理的标准吗?
Ilya Sutskever
伊利亚·苏茨克维尔
It's funny that you asked, I was actually thinking about this exact question. I haven't come up with the exact criterion yet. Maybe a prize where we could say that two years later, or three years or five years later, we look back and say like that was the main result. So rather than say that there is a prize committee that decides right away, you wait for five years and then award it retroactively.
有趣的是,你提到了这个问题,我实际上最近就在思考这个问题。我还没有想出具体的标准。或许可以设计一个奖项,两年、三年或五年后,我们回头来看,可以说那是主要的成果。因此,与其让评奖委员会立即决定,不如等五年后再追溯授奖。
Dwarkesh Patel
德瓦凯什·帕特尔
But there's no concrete thing we can identify as you solve this particular problem and you’ve made a lot of progress?
但我们无法具体确定某个问题的解决是否代表了巨大的进展吗?
Ilya Sutskever
伊利亚·苏茨克维尔
A lot of progress, yes. I wouldn't say that this would be the full thing.
确实会有很大的进展,但我不会说这就是全部。
Dwarkesh Patel
德瓦凯什·帕特尔
Do you think end-to-end training is the right architecture for bigger and bigger models? Or do we need better ways of just connecting things together?
你认为端到端训练是更大规模模型的正确架构吗?还是我们需要更好的方法来将不同组件连接在一起?
Ilya Sutskever
伊利亚·苏茨克维尔
End-to-end training is very promising. Connecting things together is very promising.
端到端训练非常有前景。连接不同组件也非常有前景。
Dwarkesh Patel
德瓦凯什·帕特尔
Everything is promising.
一切都很有前景。
Dwarkesh Patel
德瓦凯什·帕特尔
So OpenAI is projecting revenues of a billion dollars in 2024. That might very well be correct but I'm just curious, when you're talking about a new general-purpose technology, how do you estimate how big a windfall it'll be? Why that particular number?
OpenAI预测2024年的收入将达到10亿美元。这可能完全正确,但我很好奇,当你谈论一种新的通用技术时,你如何估计它将带来的巨大收益?为什么是这个具体数字?
Ilya Sutskever
伊利亚·苏茨克维尔
We've had a product for quite a while now, back from the GPT-3 days, from two years ago through the API and we've seen how it grew. We've seen how the response to DALL-E has grown as well and you see how the response to ChatGPT is, and all of this gives us information that allows us to make relatively sensible extrapolations of anything. Maybe that would be one answer. You need to have data, you can’t come up with those things out of thin air because otherwise, your error bars are going to be like 100x in each direction.
我们已经推出产品有一段时间了,从两年前GPT-3通过API推出开始,我们看到了它的增长。我们也看到了DALL-E的反响如何增长,你可以看到ChatGPT的反响,这些信息让我们能够对任何事物进行相对合理的外推。这可能是一个答案。你需要有数据,不能凭空想象,否则你的误差范围可能会有100倍的波动。
Dwarkesh Patel
德瓦凯什·帕特尔
But most exponentials don't stay exponential especially when they get into bigger and bigger quantities, right? So how do you determine in this case?
但大多数指数增长不会一直保持指数增长,尤其是在数量越来越大时,对吧?那么在这种情况下你如何确定?
Ilya Sutskever
伊利亚·苏茨克维尔
Would you bet against AI?
你会对AI下反对赌注吗?
Post AGI future 后AGI时代的未来
Dwarkesh Patel
德瓦凯什·帕特尔
Not after talking with you. Let's talk about what a post-AGI future looks like. I'm guessing you're working 80-hour weeks towards this grand goal that you're really obsessed with. Are you going to be satisfied in a world where you're basically living in an AI retirement home? What are you personally doing after AGI comes?
和你谈过之后不会了。我们来谈谈后AGI时代的未来。我猜你每周工作80小时,为这个你非常执着的宏伟目标而努力。你会满足于一个你基本上生活在AI“退休之家”的世界吗?AGI实现后你个人会做些什么?
Ilya Sutskever
伊利亚·苏茨克维尔
The question of what I'll be doing or what people will be doing after AGI comes is a very tricky question. Where will people find meaning? But I think that that's something that AI could help us with. One thing I imagine is that we will be able to become more enlightened because we interact with an AGI which will help us see the world more correctly, and become better on the inside as a result of interacting. Imagine talking to the best meditation teacher in history, that will be a helpful thing. But I also think that because the world will change a lot, it will be very hard for people to understand what is happening precisely and how to really contribute. One thing that I think some people will choose to do is to become part AI. In order to really expand their minds and understanding and to really be able to solve the hardest problems that society will face then.
AGI实现后我会做什么,人们会做什么,这确实是一个非常棘手的问题。人们将从哪里找到意义?但我认为这是AI可以帮助我们的地方。我想象的一件事是,通过与AGI互动,我们能够变得更加开悟,能够更正确地看待世界,并通过互动使内心得到改善。想象一下与历史上最好的冥想老师交谈,这将是一件非常有帮助的事。但我也认为,由于世界将发生很大变化,人们很难准确理解正在发生的事情以及如何真正做出贡献。我认为,有些人可能会选择成为部分AI,以真正扩展他们的思维和理解能力,并真正能够解决社会届时面临的最困难的问题。
Dwarkesh Patel
德瓦凯什·帕特尔
Are you going to become part AI?
你会成为部分AI吗?
Ilya Sutskever
伊利亚·苏茨克维尔
It is very tempting.
这确实很有诱惑力。
Dwarkesh Patel
德瓦凯什·帕特尔
Do you think there'll be physically embodied humans in the year 3000?
你认为到3000年还会有实体存在的人类吗?
Ilya Sutskever
伊利亚·苏茨克维尔
3000? How do I know what’s gonna happen in 3000?
3000年?我怎么可能知道到时候会发生什么?
Dwarkesh Patel
德瓦凯什·帕特尔
Like what does it look like? Are there still humans walking around on Earth? Or have you guys thought concretely about what you actually want this world to look like?
那时世界会是什么样子?地球上还会有人类行走吗?或者你们是否具体思考过你们希望这个世界变成什么样?
Ilya Sutskever
伊利亚·苏茨克维尔
Let me describe to you what I think is not quite right about the question. It implies we get to decide how we want the world to look like. I don't think that picture is correct. Change is the only constant. And so of course, even after AGI is built, it doesn't mean that the world will be static. The world will continue to change, the world will continue to evolve. And it will go through all kinds of transformations. I don't think anyone has any idea of how the world will look like in 3000. But I do hope that there will be a lot of descendants of human beings who will live happy, fulfilled lives where they're free to do as they see fit. Or they are the ones who are solving their own problems. One world which I would find very unexciting is one where we build this powerful tool, and then the government said — Okay, so the AGI said that society should be run in such a way and now we should run society in such a way. I'd much rather have a world where people are still free to make their own mistakes and suffer their consequences and gradually evolve morally and progress forward on their own, with the AGI providing more like a base safety net.
让我告诉你,我认为这个问题的问题所在。它暗示了我们可以决定世界应该是什么样子。我不认为这种看法是正确的。唯一不变的就是变化。因此,即使AGI被建造出来,也并不意味着世界会停滞不前。世界将继续变化,继续演化,并经历各种各样的转变。我不认为有人能真正知道3000年的世界会是什么样。但我希望那时有很多人类后代,他们过着幸福、充实的生活,自由地做他们认为合适的事情,或者自己解决问题。我认为一个非常无聊的世界是这样的:我们建造了这个强大的工具,然后政府说——好吧,AGI建议社会应该这样运行,现在我们应该按照这种方式运行社会。我更希望看到一个世界,人们仍然可以自由地犯错,承担后果,并在道德上逐步演化、前进,而AGI只是提供一个基础的安全保障网。
Dwarkesh Patel
德瓦凯什·帕特尔
How much time do you spend thinking about these kinds of things versus just doing the research?
你在思考这些问题和专注于研究之间分配多少时间?
Ilya Sutskever
伊利亚·苏茨克维尔
I do think about those things a fair bit. They are very interesting questions.
我确实花了不少时间思考这些问题。这些都是非常有趣的问题。
Dwarkesh Patel
德瓦凯什·帕特尔
The capabilities we have today, in what ways have they surpassed where we expected them to be in 2015? And in what ways are they still not where you'd expected them to be by this point?
我们今天的能力在哪些方面已经超出了2015年的预期?在哪些方面还未达到当时的预期?
Ilya Sutskever
伊利亚·苏茨克维尔
In fairness, it's sort of what I expected in 2015. In 2015, my thinking was a lot more — I just don't want to bet against deep learning. I want to make the biggest possible bet on deep learning. I don't know how, but it will figure it out.
公平地说,这大体符合我在2015年的预期。在2015年,我更多的想法是——我不想对深度学习持反对态度。我想尽可能地押注深度学习。我不知道具体如何,但它会找到办法。
Dwarkesh Patel
德瓦凯什·帕特尔
But is there any specific way in which it's been more than you expected or less than you expected? Like some concrete prediction out of 2015 that's been bounced?
有没有哪些具体方面超出了你的预期,或未达到你的预期?有没有你在2015年做出的具体预测被推翻了?
Ilya Sutskever
伊利亚·苏茨克维尔
Unfortunately, I don't remember concrete predictions I made in 2015. But I definitely think that overall, in 2015, I just wanted to move to make the biggest bet possible on deep learning, but I didn't know exactly. I didn't have a specific idea of how far things will go in seven years.
不幸的是,我不记得2015年我做过的具体预测。但我确实认为,总体而言,在2015年,我只是想尽可能地押注深度学习,但我并不清楚具体情况。我没有明确的想法认为事情在七年内会走多远。
Well, no in 2015, I did have all these bets with people in 2016, maybe 2017, that things will go really far. But specifics. So it's like, it's both, it's both the case that it surprised me and I was making these aggressive predictions. But maybe I believed them only 50% on the inside.
不过,在2015年,我确实与人打过赌,大概在2016年或2017年,认为事情会进展得非常远。但具体细节就不确定了。所以这是双重的——既让我感到惊讶,同时我也做了这些激进的预测。但或许在内心深处,我只相信了50%。
Dwarkesh Patel
德瓦凯什·帕特尔
What do you believe now that even most people at OpenAI would find far-fetched?
你现在有什么看法,即使是OpenAI的大多数人也会觉得难以置信?
Ilya Sutskever
伊利亚·苏茨克维尔
Because we communicate a lot at OpenAI, people have a pretty good sense of what I think, and we've really reached the point at OpenAI where we see eye to eye on all these questions.
因为我们在OpenAI内部交流频繁,大家对我的想法都很清楚,我们已经达到了在所有这些问题上高度一致的程度。
Dwarkesh Patel
德瓦凯什·帕特尔
Google has its custom TPU hardware, it has all this data from all its users, Gmail, and so on. Does it give them an advantage in terms of training bigger models and better models than you?
谷歌拥有定制的TPU硬件,还有来自用户的数据,比如Gmail等。这是否让他们在训练更大、更好的模型方面比你们有优势?
Ilya Sutskever
伊利亚·苏茨克维尔
At first, when the TPU came out, I was really impressed and I thought — wow, this is amazing. But that's because I didn't quite understand hardware back then. What really turned out to be the case is that TPUs and GPUs are almost the same thing.
最初,TPU刚推出时,我印象深刻,觉得——哇,太厉害了。但那是因为我当时对硬件了解不多。事实证明,TPU和GPU几乎是一样的东西。
They are very, very similar. The GPU chip is a little bit bigger, the TPU chip is a little bit smaller, maybe a little bit cheaper. But then they make more GPUs and TPUs so the GPUs might be cheaper after all.
它们非常相似。GPU芯片稍大一些,TPU芯片稍小一些,也许稍便宜一些。但由于GPU和TPU的产量更高,GPU可能最终更便宜。
But fundamentally, you have a big processor, and you have a lot of memory, and there is a bottleneck between those two. And the problem that both the TPU and the GPU are trying to solve is that the amount of time it takes you to move one floating point from the memory to the processor, you can do several hundred floating point operations on the processor, which means that you have to do some kind of batch processing. And in this sense, both of these architectures are the same. So I really feel like in some sense, the only thing that matters about hardware is cost per flop and overall systems cost.
但从根本上说,你有一个大处理器,还有大量内存,而这两者之间存在瓶颈。TPU和GPU试图解决的问题是,从内存到处理器移动一个浮点数所需的时间里,处理器可以完成几百次浮点运算,这意味着你必须进行某种批处理。从这个意义上讲,这两种架构是相同的。所以在某种意义上,我觉得硬件唯一重要的指标是每次浮点运算的成本和整体系统成本。
Dwarkesh Patel
德瓦凯什·帕特尔
There isn't that much difference?
差别不大?
Ilya Sutskever
伊利亚·苏茨克维尔
Actually, I don't know. I don't know what the TPU costs are but I would suspect that if anything, TPUs are probably more expensive because there are less of them.
其实我不知道。我不知道TPU的具体成本,但我怀疑TPU可能更贵,因为它们的数量较少。
New ideas are overrated 新想法被高估了
Dwarkesh Patel
德瓦凯什·帕特尔
When you are doing your work, how much of the time is spent configuring the right initializations? Making sure the training run goes well and getting the right hyperparameters, and how much is it just coming up with whole new ideas?
在你的工作中,花多少时间用于配置正确的初始化?确保训练过程顺利、设置正确的超参数,又有多少时间用于提出全新的想法?
Ilya Sutskever
伊利亚·苏茨克维尔
I would say it's a combination. Coming up with whole new ideas is a modest part of the work. Certainly coming up with new ideas is important but even more important is to understand the results, to understand the existing ideas, to understand what's going on.
我会说这是两者的结合。提出全新想法是工作的一小部分。毫无疑问,提出新想法很重要,但更重要的是理解结果、理解现有的想法,弄清楚正在发生什么。
A neural net is a very complicated system, right? And you ran it, and you get some behavior, which is hard to understand. What's going on? Understanding the results, figuring out what next experiment to run, a lot of the time is spent on that. Understanding what could be wrong, what could have caused the neural net to produce a result which was not expected.
神经网络是一个非常复杂的系统,对吧?当你运行它时,会得到一些难以理解的行为。这到底是怎么回事?理解结果,确定下一步该做什么实验,很多时间花在这些上。需要弄清楚可能出了什么问题,是什么导致神经网络产生了出乎意料的结果。
I'd say a lot of time is spent coming up with new ideas as well. I don't like this framing as much. It's not that it's false but the main activity is actually understanding.
我会说也花了不少时间在提出新想法上。但我不太喜欢这种表述。并不是说它不正确,但主要的活动其实是理解。
Dwarkesh Patel
德瓦凯什·帕特尔
What do you see as the difference between the two?
你认为这两者有什么区别?
Ilya Sutskever
伊利亚·苏茨克维尔
At least in my mind, when you say come up with new ideas, I'm like — Oh, what happens if it did such and such? Whereas understanding it's more like — What is this whole thing? What are the real underlying phenomena that are going on? What are the underlying effects? Why are we doing things this way and not another way? And of course, this is very adjacent to what can be described as coming up with ideas. But the understanding part is where the real action takes place.
至少在我看来,当你说提出新想法时,我的反应是——哦,如果这么做会发生什么?而理解更像是——这整个系统是什么?真正的底层现象是什么?底层的影响是什么?为什么我们这样做而不是那样做?当然,这与可以描述为提出想法的活动非常接近。但真正重要的行动发生在理解的过程中。
Dwarkesh Patel
德瓦凯什·帕特尔
Does that describe your entire career? If you think back on something like ImageNet, was that more new idea or was that more understanding?
这是否描述了你的整个职业生涯?如果回想像ImageNet这样的项目,那是更多依赖新想法,还是更多依赖理解?
Ilya Sutskever
伊利亚·苏茨克维尔
Well, that was definitely understanding. It was a new understanding of very old things.
嗯,那绝对是理解。是一种对非常古老事物的新理解。
Dwarkesh Patel
德瓦凯什·帕特尔
What has the experience of training on Azure been like?
在Azure上进行训练的体验如何?
Ilya Sutskever
伊利亚·苏茨克维尔
Fantastic. Microsoft has been a very, very good partner for us. They've really helped take Azure and bring it to a point where it's really good for ML and we’re super happy with it.
非常棒。微软是我们的一个非常好的合作伙伴。他们确实帮助提升了Azure,使其在机器学习方面表现得非常出色,我们对此非常满意。
Dwarkesh Patel
德瓦凯什·帕特尔
How vulnerable is the whole AI ecosystem to something that might happen in Taiwan? So let's say there's a tsunami in Taiwan or something, what happens to AI in general?
整个AI生态系统在多大程度上会受到台湾可能发生的事件的影响?比如说,台湾发生了海啸,AI领域会受到什么影响?
Ilya Sutskever
伊利亚·苏茨克维尔
It's definitely going to be a significant setback. No one will be able to get more compute for a few years. But I expect compute will spring up. For example, I believe that Intel has fabs just like a few generations ago. So that means that if Intel wanted to they could produce something GPU-like from four years ago. But yeah, it's not the best.
这绝对会是一个重大的挫折。几年内没有人能够获得更多的计算能力。但我预计计算能力会重新出现。例如,我相信Intel有几代前的晶圆厂。这意味着如果Intel愿意,他们可以生产出类似四年前的GPU的东西。不过,是的,这并不是最理想的情况。
I'm actually not sure if my statement about Intel is correct, but I do know that there are fabs outside of Taiwan, they're just not as good. But you can still use them and still go very far with them. It's just cost, it’s just a setback.
实际上,我不确定关于Intel的说法是否准确,但我确实知道台湾以外也有晶圆厂,只是性能不如台湾的好。但你仍然可以使用它们,并且仍然可以走得很远。这只是一个成本问题,只是一个挫折。
Cost of models 模型的成本
Dwarkesh Patel
德瓦凯什·帕特尔
Would inference get cost prohibitive as these models get bigger and bigger?
随着模型变得越来越大,推理成本是否会变得高得令人望而却步?
Ilya Sutskever
伊利亚·苏茨克维尔
I have a different way of looking at this question. It's not that inference will become cost prohibitive. Inference of better models will indeed become more expensive. But is it prohibitive? That depends on how useful it is. If it is more useful than it is expensive, then it is not prohibitive.
我对这个问题有不同的看法。并不是说推理会变得高得令人望而却步。更好的模型推理确实会更昂贵。但是否令人望而却步取决于它的实用性。如果它的实用性超过了它的成本,那就不算高不可攀。
好想法是有成本的。
To give you an analogy, suppose you want to talk to a lawyer. You have some case or need some advice or something, you're perfectly happy to spend $400 an hour. Right? So if your neural net could give you really reliable legal advice, you'd say — I'm happy to spend $400 for that advice. And suddenly inference becomes very much non-prohibitive. The question is, can a neural net produce an answer good enough at this cost?
打个比方,假设你想咨询律师。你有个案件或需要一些建议,你愿意花每小时400美元,对吧?所以如果你的神经网络可以为你提供非常可靠的法律建议,你会说——我愿意花400美元买这个建议。于是推理成本突然变得完全可以接受。问题是,神经网络能否以这个成本给出足够好的答案?
Dwarkesh Patel
德瓦凯什·帕特尔
Yes. And you will just have price discrimination in different models?
是的。不同的模型之间会进行价格区分吗?
Ilya Sutskever
伊利亚·苏茨克维尔
It's already the case today. On our product, the API serves multiple neural nets of different sizes and different customers use different neural nets of different sizes depending on their use case.
这已经是现实情况了。在我们的产品中,API提供了不同规模的神经网络,不同客户根据自己的用例使用不同规模的神经网络。
If someone can take a small model and fine-tune it and get something that's satisfactory for them, they'll use that. But if someone wants to do something more complicated and more interesting, they’ll use the biggest model.
如果有人能够使用一个小模型并对其进行微调,从而得到让他们满意的结果,他们就会使用那个小模型。但如果有人想做更复杂、更有趣的事情,他们就会使用最大的模型。
Dwarkesh Patel
德瓦凯什·帕特尔
How do you prevent these models from just becoming commodities where these different companies just bid each other's prices down until it's basically the cost of the GPU run?
如何防止这些模型变成商品化,不同公司相互压低价格,直到价格基本等同于运行GPU的成本?
Ilya Sutskever
伊利亚·苏茨克维尔
Yeah, there's without question a force that's trying to create that. And the answer is you got to keep on making progress. You got to keep improving the models, you gotta keep on coming up with new ideas and making our models better and more reliable, more trustworthy, so you can trust their answers. All those things.
是的,毫无疑问有一种力量在推动这种情况。但答案是,你必须不断进步。你必须持续改进模型,不断提出新想法,让我们的模型变得更好、更可靠、更值得信赖,以便用户可以信任它们的答案。这些都是关键。
Dwarkesh Patel
德瓦凯什·帕特尔
Yeah. But let's say it's 2025 and somebody is offering the model from 2024 at cost. And it's still pretty good. Why would people use a new one from 2025 if the one from just a year older is even better?
是的。但假设到了2025年,有人以成本价提供2024年的模型,而它仍然非常不错。那么人们为什么会选择2025年的新模型,而不是仅仅比它旧一年的模型?
Ilya Sutskever
伊利亚·苏茨克维尔
There are several answers there. For some use cases that may be true. There will be a new model for 2025, which will be driving the more interesting use cases. There is also going to be a question of inference cost. If you can do research to serve the same model at less cost. The same model will cost different amounts to serve for different companies. I can also imagine some degree of specialization where some companies may try to specialize in some area and be stronger compared to other companies. And to me that may be a response to commoditization to some degree.
这里有几个答案。对于某些用例,这可能是正确的。但2025年会有一个新模型,它将推动更有趣的用例。此外,还涉及推理成本的问题。如果你可以通过研究以更低的成本运行同一个模型,那么不同公司运行同一模型的成本可能会有所不同。我还可以想象某种程度的专业化,某些公司可能会尝试在某个领域专精,从而在这一领域比其他公司更强。对我来说,这可能在一定程度上是对商品化的一种应对。
同质化竞争看上去有很大的可能性。
Dwarkesh Patel
德瓦凯什·帕特尔
Over time do the research directions of these different companies converge or diverge? Are they doing similar and similar things over time? Or are they branching off into different areas?
随着时间推移,不同公司的研究方向是趋于收敛还是分化?它们是否随着时间的推移在做越来越相似的事情,还是在不同领域分支发展?
Ilya Sutskever
伊利亚·苏茨克维尔
I’d say in the near term, it looks like there is convergence. I expect there's going to be a convergence-divergence-convergence behavior, where there is a lot of convergence on the near-term work, there's going to be some divergence on the longer-term work. But then once the longer-term work starts to bear fruit, there will be convergence again.
我会说,在短期内,看起来是趋于收敛的。我预计会有一种“收敛-分化-收敛”的行为模式,即短期工作上有很多收敛,长期工作上会有一些分化。但一旦长期工作开始产生成果,又会出现收敛。
Dwarkesh Patel
德瓦凯什·帕特尔
Got it. When one of them finds the most promising area, everybody just…
明白了。当其中一个找到最有前景的领域时,大家就会……
Ilya Sutskever
伊利亚·苏茨克维尔
That's right. There is obviously less publishing now so it will take longer before this promising direction gets rediscovered. But that's how I would imagine the thing is going to be. Convergence, divergence, convergence.
没错。现在显然发表的研究成果少了,因此重新发现这个有前景的方向会需要更长的时间。但我想事情会是这样的:收敛、分化、再收敛。
非常合理的推理。
Dwarkesh Patel
德瓦凯什·帕特尔
Yeah. We talked about this a little bit at the beginning. But as foreign governments learn about how capable these models are, are you worried about spies or some sort of attack to get your weights or somehow abuse these models and learn about them?
是的。我们在开头稍微谈到了一点。但随着外国政府了解这些模型的能力,你是否担心间谍或某种攻击企图窃取你们的权重,或者以某种方式滥用这些模型并研究它们?
Ilya Sutskever
伊利亚·苏茨克维尔
Yeah, you absolutely can't discount that. Something that we try to guard against to the best of our ability, but it's going to be a problem for everyone who's building this.
是的,这绝对不能排除。这是我们尽最大努力防范的事情,但对于所有在构建这些模型的人来说,这都会是个问题。
Dwarkesh Patel
德瓦凯什·帕特尔
How do you prevent your weights from leaking?
你们如何防止权重泄露?
Ilya Sutskever
伊利亚·苏茨克维尔
You have really good security people.
拥有非常优秀的安全团队。
Dwarkesh Patel
德瓦凯什·帕特尔
How many people have the ability to SSH into the machine with the weights?
有多少人可以SSH访问存储权重的机器?
Ilya Sutskever
伊利亚·苏茨克维尔
The security people have done a really good job so I'm really not worried about the weights being leaked.
我们的安全团队做得非常出色,所以我完全不担心权重会泄露。
Dwarkesh Patel
德瓦凯什·帕特尔
What kinds of emergent properties are you expecting from these models at this scale? Is there something that just comes about de novo?
在这个规模下,你期望这些模型会出现哪些新涌现的特性?是否有一些全新的意外特性会自然出现?
Ilya Sutskever
伊利亚·苏茨克维尔
I'm sure really new surprising properties will come up, I would not be surprised. The thing which I'm really excited about, the things which I’d like to see is — reliability and controllability. I think that this will be a very, very important class of emergent properties. If you have reliability and controllability that helps you solve a lot of problems. Reliability means you can trust the model's output, controllability means you can control it. And we'll see but it will be very cool if those emergent properties did exist.
我确信会出现一些全新的令人惊讶的特性,我对此并不意外。我真正感兴趣并希望看到的是——可靠性和可控性。我认为这将是非常重要的一类涌现特性。如果你有了可靠性和可控性,它们可以帮助你解决许多问题。可靠性意味着你可以信任模型的输出,可控性意味着你可以对其进行控制。我们拭目以待,但如果这些涌现特性真的存在,那将非常酷。
Dwarkesh Patel
德瓦凯什·帕特尔
Is there some way you can predict that in advance? What will happen in this parameter count, what will happen in that parameter count?
有没有什么方法可以提前预测?比如在这个参数数量下会发生什么,在另一个参数数量下又会发生什么?
Ilya Sutskever
伊利亚·苏茨克维尔
I think it's possible to make some predictions about specific capabilities though it's definitely not simple and you can’t do it in a super fine-grained way, at least today. But getting better at that is really important. And anyone who is interested and who has research ideas on how to do that, that can be a valuable contribution.
我认为对某些特定能力进行预测是可能的,尽管这绝对不简单,至少目前还无法做到非常精细的预测。但在这方面的进步非常重要。任何对此感兴趣并有研究想法的人,都可以做出有价值的贡献。
Dwarkesh Patel
德瓦凯什·帕特尔
How seriously do you take these scaling laws? There's a paper that says — You need this many orders of magnitude more to get all the reasoning out? Do you take that seriously or do you think it breaks down at some point?
你对这些缩放定律有多认真?有一篇论文说——需要更多数量级的计算才能完全实现推理能力。你对此认真对待,还是认为它在某些点会失效?
Ilya Sutskever
伊利亚·苏茨克维尔
The thing is that the scaling law tells you what happens to your log of your next word prediction accuracy, right? There is a whole separate challenge of linking next-word prediction accuracy to reasoning capability. I do believe that there is a link but this link is complicated. And we may find that there are other things that can give us more reasoning per unit effort. You mentioned reasoning tokens, I think they can be helpful. There can probably be some things that help.
问题在于,缩放定律告诉你下一个词预测准确率的对数变化,对吧?将下一个词预测准确率与推理能力联系起来是一个完全不同的挑战。我确实相信它们之间有联系,但这种联系是复杂的。我们可能会发现,还有其他方法能以更小的努力换取更多的推理能力。你提到过推理词元,我认为它们可能会有帮助。也许还有其他一些有用的东西。
Dwarkesh Patel
德瓦凯什·帕特尔
Are you considering just hiring humans to generate tokens for you? Or is it all going to come from stuff that already exists out there?
你们是否考虑雇佣人类来为你们生成词元?还是所有的词元都将来自现有的数据?
Ilya Sutskever
伊利亚·苏茨克维尔
I think that relying on people to teach our models to do things, especially to make sure that they are well-behaved and they don't produce false things is an extremely sensible thing to do.
我认为依靠人类来教我们的模型做事情,尤其是确保它们行为良好、不生成错误内容,是一件非常明智的事情。
Is progress inevitable? 进步是不可避免的吗?
Dwarkesh Patel
德瓦凯什·帕特尔
Isn't it odd that we have the data we needed exactly at the same time as we have the transformer at the exact same time that we have these GPUs? Like is it odd to you that all these things happened at the same time or do you not see it that way?
我们刚好在拥有所需数据、Transformer模型和这些GPU的同时,这是否奇怪?你是否觉得这些事情同时发生很奇怪,还是不这么认为?
Ilya Sutskever
伊利亚·苏茨克维尔
It is definitely an interesting situation that is the case. I will say that it is odd and it is less odd on some level. Here's why it's less odd — what is the driving force behind the fact that the data exists, that the GPUs exist, and that the transformers exist? The data exists because computers became better and cheaper, we've got smaller and smaller transistors. And suddenly, at some point, it became economical for every person to have a personal computer. Once everyone has a personal computer, you really want to connect them to the network, you get the internet. Once you have the internet, you suddenly have data appearing in great quantities. The GPUs were improving concurrently because you have smaller and smaller transistors and you're looking for things to do with them.
这确实是一个有趣的现象。我会说,从某种程度上看,这既奇怪又不那么奇怪。为什么不那么奇怪呢?因为数据的存在、GPU的存在以及Transformer的存在背后有一个驱动力。数据的存在是因为计算机变得更好、更便宜,晶体管越来越小。突然间,在某个时间点,每个人拥有一台个人电脑变得经济可行。一旦每个人都有个人电脑,你就希望把它们连接到网络上,于是你就有了互联网。一旦有了互联网,就突然有了大量数据。同时,GPU也在改进,因为晶体管越来越小,你需要找到利用它们的方法。
Gaming turned out to be a thing that you could do. And then at some point, Nvidia said — the gaming GPU, I might turn it into a general-purpose GPU computer, maybe someone will find it useful. It turns out it's good for neural nets. It could have been the case that maybe the GPU would have arrived five years later, ten years later. Let's suppose gaming wasn't the thing. It's kind of hard to imagine, what does it mean if gaming isn't a thing? But maybe there was a counterfactual world where GPUs arrived five years after the data or five years before the data, in which case maybe things wouldn’t have been as ready to go as they are now. But that's the picture which I imagine. All this progress in all these dimensions is very intertwined. It's not a coincidence. You don't get to pick and choose in which dimensions things improve.
游戏成为了可以做的一件事。然后在某个时间点,Nvidia说——游戏GPU,我可以把它变成一个通用GPU计算机,也许有人会觉得有用。结果发现它对神经网络很有帮助。可能GPU会晚五年、十年才出现。假设游戏不存在,这很难想象,游戏不存在意味着什么?但或许在另一个反事实的世界里,GPU比数据晚五年或早五年出现,在这种情况下,也许事情就不会像现在这样准备得那么充分。但这就是我想象的画面。所有这些领域的进步是紧密交织的。这并非巧合。你无法选择在哪些领域取得进展。
事物发展的必然规律。
Dwarkesh Patel
德瓦凯什·帕特尔
How inevitable is this kind of progress? Let's say you and Geoffrey Hinton and a few other pioneers were never born. Does the deep learning revolution happen around the same time? How much is it delayed?
这种进步有多不可避免?假设你和Geoffrey Hinton以及其他一些先驱从未出生,深度学习革命是否会在同一时间发生?会延迟多久?
Ilya Sutskever
伊利亚·苏茨克维尔
Maybe there would have been some delay. Maybe like a year delayed?
可能会有一些延迟。大概延迟一年?
Dwarkesh Patel
德瓦凯什·帕特尔
Really? That’s it?
真的?就一年?
Ilya Sutskever
伊利亚·苏茨克维尔
It's really hard to tell. I hesitate to give a longer answer because — GPUs will keep on improving. I cannot see how someone would not have discovered it. Because here's the other thing. Let's suppose no one has done it, computers keep getting faster and better. It becomes easier and easier to train these neural nets because you have bigger GPUs, so it takes less engineering effort to train one. You don't need to optimize your code as much. When the ImageNet data set came out, it was huge and it was very, very difficult to use. Now imagine you wait for a few years, and it becomes very easy to download and people can just tinker. A modest number of years maximum would be my guess. I hesitate to give a lot longer answer though. You can’t re-run the world you don’t know.
很难说。我不太愿意给出更长的时间,因为——GPU会继续改进。我看不出有人不会发现深度学习。还有另一点,假设没有人研究它,但计算机变得越来越快、越来越好。因为GPU更强大,训练这些神经网络变得越来越容易,所需的工程努力更少。你不需要对代码进行太多优化。当ImageNet数据集问世时,它非常庞大,也非常难以使用。现在想象再等几年,它变得很容易下载,人们就可以随意尝试。我的猜测是最多会有几年的延迟。但我不愿意给出更长的时间回答。毕竟,你不能重新运行这个世界,所以无法确切知道。
Dwarkesh Patel
德瓦凯什·帕特尔
Let's go back to alignment for a second. As somebody who deeply understands these models, what is your intuition of how hard alignment will be?
我们回到对齐问题。作为一个对这些模型有深入理解的人,你直觉认为对齐会有多难?
Ilya Sutskever
伊利亚·苏茨克维尔
At the current level of capabilities, we have a pretty good set of ideas for how to align them. But I would not underestimate the difficulty of alignment of models that are actually smarter than us, of models that are capable of misrepresenting their intentions. It's something to think about a lot and do research. Oftentimes academic researchers ask me what’s the best place where they can contribute. And alignment research is one place where academic researchers can make very meaningful contributions.
在当前的能力水平上,我们有一套不错的对齐方法。但我不会低估对齐比我们聪明的模型的难度,尤其是那些能够误导我们其意图的模型。这需要深入思考和研究。学术研究者经常问我,哪里是他们可以做出贡献的最佳领域。对齐研究就是一个学术研究者可以做出非常有意义贡献的领域。
Dwarkesh Patel
德瓦凯什·帕特尔
Other than that, do you think academia will come up with important insights about actual capabilities or is that going to be just the companies at this point?
除此之外,你认为学术界会在实际能力方面产生重要见解,还是目前这将只由公司来实现?
Ilya Sutskever
伊利亚·苏茨克维尔
The companies will realize the capabilities. It's very possible for academic research to come up with those insights. It doesn't seem to happen that much for some reason but I don't think there's anything fundamental about academia. It's not like academia can't. Maybe they're just not thinking about the right problems or something because maybe it's just easier to see what needs to be done inside these companies.
公司将实现这些能力。学术研究有可能提出这些见解。但出于某种原因,这种情况似乎并不多见。但我不认为这是学术界的根本问题。并不是说学术界做不到。也许他们只是没有考虑正确的问题,或者说也许在这些公司内部更容易看到需要完成的任务。
Dwarkesh Patel
德瓦凯什·帕特尔
I see. But there's a possibility that somebody could just realize…
我明白了。但有可能有人会突然意识到……
Ilya Sutskever
伊利亚·苏茨克维尔
I totally think so. Why would I possibly rule this out?
我完全同意。为什么我会排除这种可能性呢?
Dwarkesh Patel
德瓦凯什·帕特尔
What are the concrete steps by which these language models start actually impacting the world of atoms and not just the world of bits?
这些语言模型实际上如何具体影响物质世界(atoms),而不仅仅是信息世界(bits)?
Ilya Sutskever
伊利亚·苏茨克维尔
I don't think that there is a clean distinction between the world of bits and the world of atoms. Suppose the neural net tells you — hey here's something that you should do, and it's going to improve your life. But you need to rearrange your apartment in a certain way. And then you go and rearrange your apartment as a result. The neural net impacted the world of atoms.
我不认为信息世界(bits)和物质世界(atoms)之间有明确的界限。假设神经网络告诉你——嘿,你应该做某件事,它会改善你的生活。结果你需要以某种方式重新布置你的公寓,然后你去做了。这时神经网络就已经对物质世界产生了影响。
Future breakthroughs 未来的突破
Dwarkesh Patel
德瓦凯什·帕特尔
Fair enough. Do you think it'll take a couple of additional breakthroughs as important as the Transformer to get to superhuman AI? Or do you think we basically got the insights in the books somewhere, and we just need to implement them and connect them?
有道理。你认为需要再有几个像Transformer一样重要的突破才能实现超越人类的AI吗?还是你认为我们基本上已经在某些理论中有了相关见解,只需要实现并连接这些见解?
Ilya Sutskever
伊利亚·苏茨克维尔
I don't really see such a big distinction between those two cases and let me explain why. One of the ways in which progress is taking place in the past is that we've understood that something had a desirable property all along but we didn't realize. Is that a breakthrough? You can say yes, it is. Is that an implementation of something in the books? Also, yes.
我不认为这两种情况之间有很大的区别,让我解释一下为什么。过去进步的一个方式是我们终于认识到某些东西一直具有某种理想特性,但我们之前没有意识到。这算是一个突破吗?可以说是的。这算是书本中某些理论的实现吗?也是的。
My feeling is that a few of those are quite likely to happen. But in hindsight, it will not feel like a breakthrough. Everybody's gonna say — Oh, well, of course. It's totally obvious that such and such a thing can work.
我的感觉是,这种情况很可能会发生几次。但事后看来,这不会让人觉得是一个突破。每个人都会说——哦,当然了,这显然是可以实现的。
The reason the Transformer has been brought up as a specific advance is because it's the kind of thing that was not obvious for almost anyone. So people can say it's not something which they knew about. Let's consider the most fundamental advance of deep learning, that a big neural network trained in backpropagation can do a lot of things. Where's the novelty? Not in the neural network. It's not in the backpropagation. But it was most definitely a giant conceptual breakthrough because for the longest time, people just didn't see that. But then now that everyone sees, everyone’s gonna say — Well, of course, it's totally obvious. Big neural network. Everyone knows that they can do it.
Transformer之所以被认为是一个特定的进步,是因为它几乎对任何人来说都不是显而易见的。所以人们可以说,这不是他们之前知道的东西。让我们考虑深度学习最基础的进展——通过反向传播训练的大型神经网络可以做很多事情。新颖之处在哪里?不在于神经网络,也不在于反向传播。但这绝对是一个巨大的概念性突破,因为很长一段时间里,人们根本没有意识到这一点。但现在每个人都意识到了,每个人都会说——当然了,这显而易见。大型神经网络,大家都知道它们能行。
Dwarkesh Patel
德瓦凯什·帕特尔
What is your opinion of your former advisor’s new forward forward algorithm?
你对你前导师的新“前向前向算法”怎么看?
Ilya Sutskever
伊利亚·苏茨克维尔
I think that it's an attempt to train a neural network without backpropagation. And that this is especially interesting if you are motivated to try to understand how the brain might be learning its connections. The reason for that is that, as far as I know, neuroscientists are really convinced that the brain cannot implement backpropagation because the signals in the synapses only move in one direction.
我认为这是一次尝试,试图在没有反向传播的情况下训练神经网络。如果你有动机去理解大脑如何学习其连接方式,这就特别有趣。原因是,据我所知,神经科学家坚信大脑无法实现反向传播,因为突触中的信号只能单向传递。
And so if you have a neuroscience motivation, and you want to say — okay, how can I come up with something that tries to approximate the good properties of backpropagation without doing backpropagation? That's what the forward forward algorithm is trying to do. But if you are trying to just engineer a good system there is no reason to not use backpropagation. It's the only algorithm.
因此,如果你有神经科学的动机,并想问——好吧,如何在不进行反向传播的情况下尝试逼近反向传播的优良特性?这就是前向前向算法试图解决的问题。但如果你只是想设计一个好的系统,没有理由不用反向传播。这是唯一的算法。
Dwarkesh Patel
德瓦凯什·帕特尔
I guess I've heard you in different contexts talk about using humans as the existing example case that AGI exists. At what point do you take the metaphor less seriously and don't feel the need to pursue it in terms of the research? Because it is important to you as a sort of existence case.
我记得在不同场合听你提到过,将人类作为AGI存在的案例。你在什么时候会不那么认真对待这种类比,不再觉得需要在研究中追随这种类比?因为这种案例对你来说很重要。
Ilya Sutskever
伊利亚·苏茨克维尔
At what point do I stop caring about humans as an existence case of intelligence?
我什么时候会停止将人类视为智能存在的案例?
Dwarkesh Patel
德瓦凯什·帕特尔
Or as an example you want to follow in terms of pursuing intelligence in models.
或者作为你在模型中追求智能时想要遵循的一个例子。
Ilya Sutskever
伊利亚·苏茨克维尔
I think it's good to be inspired by humans, it's good to be inspired by the brain. There is an art into being inspired by humans in the brain correctly, because it's very easy to latch on to a non-essential quality of humans or of the brain. And many people whose research is trying to be inspired by humans and by the brain often get a little bit specific. People get a little bit too — Okay, what cognitive science model should be followed? At the same time, consider the idea of the neural network itself, the idea of the artificial neuron. This too is inspired by the brain but it turned out to be extremely fruitful.
我认为受到人类和大脑的启发是好事。正确地从人类和大脑中获得灵感是一门艺术,因为很容易抓住人类或大脑的非本质特性。许多试图从人类和大脑中获得灵感的研究者往往会变得过于具体。人们会过于追问——应该遵循哪种认知科学模型?同时,考虑神经网络本身的想法,即人工神经元的概念。这也是受大脑启发的,但事实证明它非常有用。
So how do they do this? What behaviors of human beings are essential that you say this is something that proves to us that it's possible? What is an essential? No this is actually some emergent phenomenon of something more basic, and we just need to focus on getting our own basics right. One can and should be inspired by human intelligence with care.
那么该如何处理呢?哪些人类行为是本质的,让你觉得这是证明其可能性的东西?哪些是非本质的?或许它实际上是某种更基础事物的涌现现象,我们只需专注于把自己的基础打好。可以,也应该从人类智能中获得灵感,但要小心。
Dwarkesh Patel
德瓦凯什·帕特尔
Final question. Why is there, in your case, such a strong correlation between being first to the deep learning revolution and still being one of the top researchers? You would think that these two things wouldn't be that correlated. But why is there that correlation?
最后一个问题。为什么在你的情况下,第一个参与深度学习革命的人和仍然是顶尖研究者之间有如此强的相关性?你会觉得这两者之间应该没有太大关系,但为什么会有这种相关性?
Ilya Sutskever
伊利亚·苏茨克维尔
I don't think those things are super correlated. Honestly, it's hard to answer the question. I just kept trying really hard and it turned out to have sufficed thus far.
我不认为这两者之间有很强的相关性。老实说,这个问题很难回答。我只是一直非常努力,而这恰好到目前为止已经足够了。
Dwarkesh Patel
德瓦凯什·帕特尔
So it's perseverance.
所以是坚持不懈?
Ilya Sutskever
伊利亚·苏茨克维尔
It's a necessary but not a sufficient condition. Many things need to come together in order to really figure something out. You need to really go for it and also need to have the right way of looking at things. It's hard to give a really meaningful answer to this question.
这是一种必要但不足够的条件。真正弄明白一些事情需要许多因素的结合。你需要全力以赴,同时也需要有正确的看待问题的方式。很难对这个问题给出一个真正有意义的答案。
Dwarkesh Patel
德瓦凯什·帕特尔
Ilya, it has been a true pleasure. Thank you so much for coming to The Lunar Society. I appreciate you bringing us to the offices. Thank you.
伊利亚,这次访谈非常愉快。非常感谢你来月亮社做客,也感谢你邀请我们到办公室。谢谢。
Ilya Sutskever
伊利亚·苏茨克维尔
Yeah, I really enjoyed it. Thank you very much.
是的,我非常享受这次访谈。非常感谢。