2025-06-17 Andrej Karpathy.Software Is Changing (Again)

2025-06-17 Andrej Karpathy.Software Is Changing (Again)


Unknown Speaker: Please welcome former director of AI Tesla Andrej Karpathy.
未知主持人:请欢迎特斯拉前 AI 总监安德烈·卡帕西。

Andrej Karpathy: Wow, a lot of people here. Hello. Okay, yeah, so I'm excited to be here today to talk to you about software in the era of AI. And I'm told that many of you are students, like bachelors, masters, PhD and so on,  and you're about to enter the industry. And I think it's actually like an extremely unique and very interesting time to enter the industry right now. And I think fundamentally the reason for that is that software is changing. Again.
安德烈·卡帕西:哇,这里有这么多人。大家好。好的,我今天非常兴奋来到这里和大家聊聊 AI 时代的软件。据说在座的很多人都是学生——本科、硕士、博士等等——你们即将进入行业。我认为此刻正是进入业界的极其特殊和有趣的时机。而最根本的原因是:软件正在发生变化,再次发生变化。

And I say again because I actually gave this talk already. But the problem is that software keeps changing. So I actually have a lot of material to create new talks. And I think it's changing quite fundamentally.
之所以说“再次”,是因为我以前就做过类似的演讲。但问题在于软件不断变化,所以我总有大量新材料可以讲。我认为这次的变化非常根本。

I think, roughly speaking, software has not changed much on such a fundamental level for 70 years. And then it's changed, I think, about twice quite rapidly in the last few years.
大体来说,过去 70 年里软件在根本层面上并没有太大变化,而在最近几年,它却迅速发生了两次重大的变革。

And so there's just a huge amount of work to do, a huge amount of software to write and rewrite. So let's take a look at maybe the realm of software. So if we kind of think of this as like the map of software,
因此,现在有大量工作要做,大量软件需要编写和重写。让我们先看看软件的版图吧。如果把这想象成一张软件的地图——

this is a really cool tool called Map of GitHub. This is kind of like all the software that's written. These are instructions to the computer for carrying out tasks in the digital space.
这是一款非常酷的工具,叫做 GitHub 地图。它展示了几乎所有已编写的软件——也就是在数字世界中指导计算机执行任务的所有指令。

So if you zoom in here, these are all different kinds of repositories,  and this is all the code that has been written. And a few years ago, I kind of observed that software was kind of changing,  and there was kind of like a new type of software around,  and I called this Software 2.0 at the time. And the idea here was that Software 1.0 is the code you write for the computer.
当你放大地图时,会看到各种仓库,也就是所有已经写出的代码。几年前,我注意到软件正在发生变化,出现了一种新型的软件,我当时称之为“软件 2.0”。在这个框架里,软件 1.0 是我们写给计算机的代码。

Software 2.0 are basically neural networks and in particular the weights of a neural network. And you're not writing this code directly.
软件 2.0 基本上是指神经网络,尤其是神经网络的权重,而这些“代码”并不是靠你直接手写完成的。

You are more kind of like tuning the data sets and then you're running an optimizer to create the parameters of this neural net. And I think like at the time neural nets were kind of seen as like just a different kind of classifier,  like a decision tree or something like that. And so I think it was kind of like I think this framing was a lot more appropriate. And now, actually,
你更多是在调优数据集,然后运行优化器来产生神经网络的参数。当时大家往往把神经网络视作另一种分类器,比如决策树之类的。所以用这种框架来描述更为贴切。而现在——

what we have is kind of like an equivalent of GitHub in the realm of Software 2.0. And I think the hugging face is basically equivalent of GitHub in Software 2.0. And there's also Model Atlas,  and you can visualize all the code written there. In case you're curious, by the way, the giant circle, the point in the middle,  these are the parameters of Flux, the image generator.
在软件 2.0 领域里,我们也有了类似 GitHub 的东西。我认为 Hugging Face 就是软件 2.0 的 GitHub,另外还有 Model Atlas,你可以在那儿可视化所有“代码”。顺便说一句,中间那个巨大圆点代表的是图像生成器 Flux 的参数。

And so anytime someone tunes a LoRa on top of a Flux model,  you basically create a Git commit in this space and you create a different kind of image generator.
因此,每当有人在 Flux 模型上微调一个 LoRa,你就相当于在这片空间里做了一次 Git 提交,并诞生了一个新的图像生成器。

So basically what we have is Software 1.0 is the computer code that programs a computer. Software 2.0 are the weights which program neural networks. And here's an example of AlexNet image recognizer neural network.
综上,软件 1.0 是编程计算机的代码;软件 2.0 是编程神经网络的权重。这里举例的是图像识别网络 AlexNet。

Now so far all of the neural networks that we've been familiar with until recently were kind of like fixed function computers. Image to categories or something like that.
到目前为止,我们熟悉的大多数神经网络都像是“固定功能”计算机,比如把图像映射到类别等。

And I think what's changed and I think is a quite fundamental change is that neural networks became programmable with large language models. And so I see this as quite new, unique. It's a new kind of a computer.
我认为发生的变化,而且是非常根本的变化,是神经网络因为大型语言模型而变得可编程。因此在我看来,这是非常新颖、独特的,它是一种全新的计算机。

And so in my mind, it's worth giving it a new designation of Software 3.0. And basically,  your prompts are now programs that program the LLM. And remarkably, these prompts are written in English.
在我看来,这值得赋予它一个新的称谓——软件 3.0。本质上,你的提示词现在就是用来编程 LLM 的程序。令人惊讶的是,这些提示词是用英语书写的。

So it's kind of a very interesting programming language. So maybe to summarize the difference, if you're doing sentiment classification, for example,  you can imagine writing some amount of Python to basically do sentiment classification,
因此它算是一种非常有趣的编程语言。举例来说,如果你要做情感分类,你可以想象写一些 Python 代码来完成情感分类,

or you can train a neural net, or you can prompt a large language model. So here, this is a few-shot prompt, and you can imagine changing it and programming the computer in a slightly different way.
或者你可以训练一个神经网络,或者你可以提示一个大型语言模型。这里是一个 few-shot 提示,你可以改变它,以稍微不同的方式来编程这台“计算机”。

So basically we have Software 1.0, Software 2.0 and I think we're seeing,  maybe you've seen a lot of GitHub code is not just like code anymore. There's a bunch of like English interspersed with code and so I think kind of.
所以基本上我们有软件 1.0、软件 2.0,而我认为我们正在看到——也许你已经注意到 GitHub 上的很多代码不再只是代码,还夹杂了大量英语文本——我觉得这算是……

There's a growing category of new kind of code. So not only is it a new programming paradigm,  it's also remarkable to me that it's in our native language of English. And so when this blew my mind a few, I guess, years ago now,
这是一类不断壮大的新型代码。这不仅是一种新的编程范式,而且令我吃惊的是,它使用的是我们的母语——英语。我记得几年前当这一幕震撼到我时,

I tweeted this and I think it captured the attention of a lot of people. And this is my currently pinned tweet, is that remarkably we're now programming computers in English.
我发了条推文,我想它引起了很多人的注意。那条推文现在仍置顶,上面写着:令人惊讶的是,我们如今正用英语来编程计算机。

Now, when I was at Tesla, we were working on the autopilot and we were trying to get the car to drive.
当我在特斯拉时,我们一直在研发自动驾驶,试图让汽车自行行驶。

And I sort of showed the slide at the time where you can imagine that the inputs to the car are on the bottom and they're going through a software stack to produce the steering and acceleration.
那时我展示过一张幻灯片,你可以想象车辆底层输入经过一个软件栈,最终输出方向盘转角和加速度。

And I made the observation at the time that there was a ton of C++ code around in the autopilot,  which was the software 1.0 code. And then there was some neural nets in there doing image recognition.
当时我指出,自动驾驶系统里有大量 C++ 代码,那是典型的软件 1.0;同时也有一些神经网络在做图像识别。

And I kind of observed that over time as we made the autopilot better basically the neural network grew in capability and size and. In addition to that,
我还观察到,随着我们不断改进自动驾驶,神经网络的能力和规模都在增长。除此之外,

all the C++ code was being deleted and a lot of the capabilities and functionality that was originally written in 1.0 was migrated to 2.0. So as an example,
大量 C++ 代码被删除,原本由 1.0 代码实现的功能被迁移到 2.0。举个例子,

a lot of the stitching up of information across images from the different cameras and across time was done by a neural network and we were able to delete a lot of code.
将不同摄像头、不同时间帧的信息拼接在一起的工作由神经网络完成,我们因而删掉了大量代码。

Software 2.0 stack quite literally ate through the software stack of the autopilot. So I thought this was really remarkable at the time. And I think we're seeing the same thing again.
软件 2.0 的栈字面意义上“吞噬”了自动驾驶的软件栈。我当时觉得这非常了不起。而且我认为现在我们又在见证同样的事情。

Where basically we have a new kind of software and it's eating through the stack. We have three completely different programming paradigms. And I think if you're entering the industry,
也就是我们拥有了一种新的软件,它正在吞噬整个栈。我们如今有三种完全不同的编程范式。我认为如果你即将进入行业,

it's a very good idea to be fluent in all of them because they all have slight pros and cons and you may want to program some functionality in 1.0 or 2.0 or 3.0. Are you going to train a neural net? Are you going to just prompt an LLM?
最好熟练掌握这三种范式,因为它们各有优缺点,你可能会把某些功能写成 1.0,或者 2.0,或者 3.0。你会去训练一个神经网络吗?还是仅用提示词操控 LLM?

Should this be a piece of code that's explicit, etc.? So we'll have to make these decisions and actually potentially fluidly transition between these paradigms. So what I want to get into now is.
或者这部分应该显式写代码?因此我们需要做出这些决定,并且在不同范式之间灵活切换。接下来我想讲的是——

First I want to in the first part talk about LLMs and how to kind of like think of this new paradigm and the ecosystem and what that looks like. Like what are what is this new computer?
首先,我想在第一部分谈谈 LLM,以及如何理解这种新范式及其生态,它长什么样?这台新“计算机”究竟是什么?

What does it look like and what is the ecosystem look like? I was struck by this quote from Enduring actually many years ago now, I think. And I think Endure is going to be speaking right after me.
它的形态如何?生态如何?多年前我被安德鲁·吴的一句话震撼到,我想之后安德鲁也会演讲。

But he said at the time, AI is the new electricity. And I do think that it kind of captures something very interesting in that LLMs certainly feel like they have properties of utilities right now. LLM labs like OpenAI, Gemini, Entropi, etc.
他当时说,“AI 是新的电力”。我觉得这句话抓住了很有趣的一点:LLM 现在确实像是一种“公共事业”。像 OpenAI、Gemini、Anthropic 等这些 LLM 实验室……

They spend CAPEX to train the LLMs and this is kind of equivalent to building out a grid. And then there's OPEX to serve that intelligence over APIs to all of us.
他们投入资本支出(CAPEX)来训练这些大型语言模型,这有点类似于建设电网。接着还需要运营支出(OPEX)通过 API 向我们所有人提供这种智能。

And this is done through metered access where we pay per million tokens or something like that. And we have a lot of demands that are very utility-like demands out of this API. We demand low latency, high uptime, consistent quality, etc.
这种模式通过计量访问实现,我们按百万 token 之类的方式付费。对于这一 API,我们提了很多类似公共事业的要求——低延迟、高可用、质量稳定,等等。

In electricity,  you would have a transfer switch so you can transfer your electricity source from like grid and solar or battery or generator.
在电力系统中,你会有切换开关,可在电网、太阳能、电池或发电机等电源之间切换。

In LLMs, we have maybe open router and easily switch between the different types of LLMs that exist. Because the LLMs are software, they don't compete for physical space.
在 LLM 领域,我们或许有 open router 之类的工具,可轻松在不同 LLM 之间切换。由于 LLM 属于软件,它们并不在物理空间上竞争。

So it's okay to have basically like six electricity providers and you can switch between them, right? Because they don't compete in such a direct way. And I think what's also a little fascinating,
所以就算有六家电力供应商让你自由切换也没问题,对吧?因为它们并不会直接对打。我觉得还有一点挺迷人——

and we saw this in the last few days actually,  a lot of the LLMs went down and people were kind of like stuck and unable to work. And I think it's kind of fascinating to me that when the state-of-the-art LLMs go down,
就在过去几天里,我们看到很多 LLM 宕机,人们陷入停滞、无法工作。我觉得很有意思——当最先进的 LLM 宕机时,

it's actually kind of like an intelligence brownout in the world. It's kind of like when the voltage is unreliable in the grid and the planet just gets dumber,  the more reliance we have on these models,
这就像全球出现了一次“智能褐停”。就好比电网电压不稳时一样,随着我们对这些模型依赖度越高,整个星球就会变“笨”,

which already is like really dramatic and I think will continue to grow. But LLMs don't only have properties of utilities. I think it's also fair to say that they have some properties of fabs.
这种影响已经相当惊人,而且还会继续放大。但 LLM 并非只具备公共事业属性,它们也有一些类似“晶圆厂”的特征。

And the reason for this is that the capex required for building LLMs is actually quite large. It's not just like building some power station or something like that, right?
原因在于,构建 LLM 所需的资本支出非常巨大,并不像建座电站那样简单,对吧?

You're investing a huge amount of money and I think the tech tree and for the technology is growing quite rapidly. So we're in a world where we have sort of deep tech trees,
你要投入巨额资金,而且技术树正在飞速扩张。我们正处于一个技术栈极其深厚的时代,

research and development secrets that are centralizing inside the LLM labs. But I think the analogy muddies a little bit also because, as I mentioned, this is software. And software is a bit less defensible because it is so malleable.
研发机密正集中在这些 LLM 实验室。但这个类比也有些混淆,因为如我所说,这毕竟是软件;软件由于高度可塑,护城河并不深。

And so I think it's just an interesting kind of thing to think about potentially. There's many analogies you can make. Like a four nanometer process node maybe is something like a cluster with certain max flops.
因此我觉得这值得我们深入思考,可以做很多类比。比如说,4 nm 工艺节点也许相当于一台具备特定峰值 FLOPS 的集群。

You can think about when you're using NVIDIA GPUs and you're only doing the software and you're not doing the hardware,  that's kind of like the fabless model.
如果你只用 NVIDIA GPU,自己只做软件不做硬件,那就有点像“无厂”模式(fabless)。

But if you're actually also building your own hardware and you're training on TPUs if you're Google,  that's kind of like the Intel model where you own your fab. So I think there's some analogies here that make sense.
但如果你像 Google 那样自研硬件并在 TPU 上训练,那就类似英特尔模式——你拥有自己的晶圆厂。所以这些类比还是讲得通的。

But actually I think the analogy that makes the most sense perhaps is that in my mind,  LLMs have very strong kind of analogies to operating systems. In that this is not just electricity or water.
不过我认为最贴切的类比可能是:在我看来,LLM 与操作系统的对应关系非常强。它们并不仅仅是电力或自来水。

It's not something that comes out of the tap as a commodity. These are now increasingly complex software ecosystems. So they're not just like simple commodities like electricity.
它们不是拧开水龙头就来的商品,而是日益复杂的软件生态系统,因此并不像电力那样简单。

And it's kind of interesting to me that the ecosystem is shaping in a very similar kind of way where you have a few closed source providers like Windows or Mac OS and then you have an open source alternative like Linux.
我觉得很有意思的是,这个生态正在以类似的方式成形:有少数闭源提供商,比如 Windows 或 macOS,也有开源替代方案,比如 Linux。

And I think for LLMs as well,  we have kind of a few competing closed source providers and then maybe the LLAMA ecosystem is currently like maybe a close approximation to something that may grow into something like Linux.
我认为在 LLM 领域也是如此——有几家竞争的闭源厂商,而 LLAMA 生态或许就是目前最接近未来“LLM 版 Linux”雏形的开源方案。
Warning
不一定,LLM的难度更低,一个模型+一堆芯片就能搭建,接口统一,Windows 或 macOS的差异性要大的多。
Again, I think it's still very early because these are just simple LLMs, but we're starting to see that these are going to get a lot more complicated. It's not just about the LLM itself. It's about all the tool use and the multi-modalities and how all of that works. And so when I sort of had this realization a while back, I tried to sketch it out, and it kind of seemed to me like LLMs are kind of like a new operating system, right? So the LLM is a new kind of a computer. It's kind of like the CPU equivalent. The context windows are kind of like the memory.
再强调一次,现在仍然只是早期阶段,因为现有的大模型还很“简单”,但我们已经开始看到它们会变得更复杂——不仅仅是模型本身,还包括各种工具调用、多模态处理,以及它们的协同方式。早前我意识到这一点后,曾尝试把它画出来:在我看来,LLM 就像一种新的操作系统——LLM 是一种全新的“计算机”,有点类似于 CPU,而上下文窗口就像内存。

And then the LLM is orchestrating memory and compute for problem solving using all of these capabilities here. And so definitely if you look at it, it looks very much like operating system from that perspective. A few more analogies.
随后,LLM 利用这些能力对“内存”和“计算”进行编排来解决问题。从这个角度看,它的确很像操作系统。再举几个类比。

For example, if you want to download an app, say I go to VS Code and I go to download, you can download VS Code and you can run it on Windows, Linux or Mac in the same way as you can take an LLM app like Cursor and you can run it on GPT or Claude or Gemini series, right? It's just a dropdown. So it's kind of like similar in that way as well.
例如,如果你要下载一个应用,比如 VS Code,你可以下载后在 Windows、Linux 或 Mac 上运行;同样地,你也可以把像 Cursor 这样的 LLM 应用“运行”在 GPT、Claude 或 Gemini 等模型之上——只是下拉菜单的选择而已。这两者方式非常相似。

More analogies that I think strike me is that we're kind of like in this 1960s-ish era where LLM compute is still very expensive for this new kind of a computer and that forces the LLMs to be centralized in the cloud and we're all just sort of thin clients that interact with it over the network.
还有一个让我印象深刻的类比:我们如今有点像 1960 年代——这种新型计算机(LLM)的算力仍然昂贵,因此模型被迫集中部署在云端,我们都只是通过网络与之交互的“瘦客户端”。

And none of us have full utilization of these computers and therefore it makes sense to use time sharing where we're all just, you know, a dimension of the batch when they're running the computer in the cloud.
没有个人能够独占整台“计算机”,因此使用时间共享模式最合理:当云端运行这些模型时,我们只是批处理中的一个维度。

And this is very much what computers used to look like during this time. The operating systems were in the cloud. Everything was streamed around and there was batching.
这与当年计算机的形态非常相似:操作系统位于云端,一切通过流式方式传输,并采用批处理。

And so the personal computing revolution hasn't happened yet because it's just not economical. It doesn't make sense, but I think some people are trying and it turns out that Mac minis, for example, are a very good fit for some of the LLMs because it's all if you're doing batch-one inference. This is all super memory-bound and this actually works.
因此,“个人 LLM 计算机”革命尚未到来,因为成本不经济、不划算。但有人已经开始尝试,比如发现 Mac mini 对某些小型 LLM 很适合,用于 batch-1 推理时主要受内存限制,效果还不错。

And I think these are some early indications maybe of personal computing, but this hasn't really happened yet. It's not clear what this looks like. Maybe some of you get to invent what this is or how it works or what this should be.
我认为这或许是“个人计算”早期迹象,但真正的形态尚未出现,还不清楚未来会怎样——也许你们中的某些人将去发明它、实现它、定义它。

Maybe one more analogy that I'll mention is whenever I talk to ChatGPT or some LLM directly in text, I feel like I'm talking to an operating system through the terminal. It's text, it's direct access to the operating system, and I think a GUI hasn't yet really been invented in a general way. Should ChatGPT have a GUI, different than just the tech bubbles? Certainly some of the apps that we're gonna go into in a bit have GUI, but there's no —
再补一个类比:每当我用文本与 ChatGPT 或其他 LLM 交流时,我感觉自己像在用终端与操作系统对话——纯文本、直达系统核心。而真正意义上的 GUI 似乎尚未诞生:ChatGPT 是否需要一种不同于聊天气泡的 GUI?当然,接下来会提到的一些应用已经有 GUI,但目前还没有普遍的……

There are some ways in which LLMs are different from operating systems in some fairly unique way and from early computing. And I wrote about this one particular property that strikes me as very different.
LLM 在某些方面又和操作系统、早期计算机截然不同。我写过一篇文章,提到一个让我印象深刻的差异。

This time around, it's that LLMs like flip, they flip the direction of technology diffusion that is usually present in technology.
这一轮变革中,LLM 颠倒了技术扩散的方向——与通常技术普及路径正好相反。

So for example, with electricity, cryptography, computing, flight, internet, GPS, lots of new transformative technologies that have not been around.
例如电力、密码学、计算机、航空、互联网、GPS 等这些颠覆性新技术——

Typically, it is the government and corporations that are the first users because it's new and expensive, etc. And it only later diffuses to consumer. But I feel like LLMs are kind of like flipped around.
往往最先由政府和大型企业使用,因为新技术昂贵而复杂,随后才向消费者普及。但在 LLM 上,这个顺序似乎被颠倒了。

So maybe with early computers, it was all about ballistics and military use. But with LLMs, it's all about how do you boil an egg or something like that. This is certainly like a lot of my use. And so it's really fascinating to me that we have a new magical computer and it's like helping me boil an egg. It's not helping the government do something really crazy like some military ballistics or some special technology.
也许在早期计算机时代,一切都围绕弹道学和军事用途展开;但在大型语言模型时代,人们关心的却是如何把鸡蛋煮熟之类的事情。这至少反映了我个人的许多用法。让我着迷的是,我们拥有了一台“魔法”般的新计算机,它却在帮我煮鸡蛋,而不是帮助政府完成军用弹道或其他尖端技术的疯狂任务。

Indeed, corporations or governments are lagging behind the adoption of all of us, of all of these technologies. So it's just backwards.
的确,在采纳这些技术方面,企业和政府的步伐落后于普通大众。这一切完全倒置了。

And I think it informs maybe some of the uses of how we want to use this technology or like what are some of the first apps and so on. So, in summary so far, LLM Labs fab LLMs, I think it's accurate language to use,  but LLMs are complicated operating systems. They're circa 1960s in computing and we're redoing computing all over again. And they're currently available via timesharing and distributed like a utility.
这也启发我们思考如何利用这项技术,以及首批应用可能是什么。总结到目前为止,大型模型实验室打造大型模型,用“制造(fab)LLM”这个说法很贴切;但 LLM 也是复杂的操作系统,它们的计算模式类似 20 世纪 60 年代,我们正在重走一次计算史。目前它们通过时间共享方式提供服务,像公共事业那样分发。

What is new and unprecedented is that they're not in the hands of a few governments and corporations. They're in the hands of all of us because we all have a computer and it's all just software and ChatGPT was beamed down to our computers, like to billions of people, like instantly and overnight. And this is insane. And it's kind of insane to me that this is the case And now it is our time to enter the industry and program these computers. This is crazy.
前所未有的新现象在于,这些技术并不掌握在少数政府或企业手中,而是落入了我们每个人手里。因为我们都有电脑,而这些只是软件;ChatGPT 以近乎一夜之间的速度“空降”到全球数十亿台电脑中。这太疯狂了,而这确实正在发生。现在轮到我们进入行业、为这些新计算机编程了——这同样疯狂。

So I think this is quite remarkable. Before we program LLMs, we have to kind of like spend some time to think about what these things are. And I especially like to kind of talk about their psychology.
这一切非常值得注意。在编程 LLM 之前,我们得花些时间思考它们到底是什么。我个人尤其想探讨的是它们的“心理”。

So the way I like to think about LLMs is that they're kind of like people spirits. They are stochastic simulations of people. And the simulator in this case happens to be an autoregressive transformer. So a transformer is a neural net.
我喜欢把 LLM 想象成人类精神的化身——它们是人类的随机模拟器,而这个模拟器恰好是自回归 Transformer。Transformer 本质上是一种神经网络。

It's and it just kind of like it goes on the level of tokens it goes chunk chunk chunk chunk chunk and there's an almost equal amount of compute for every single chunk. And. This simulator, of course,  is basically there's some weights involved and we fit it to all the texts that we have on the internet and so on. And you end up with this kind of a simulator.
它以 token 为单位一次处理一块、一块、一块,对每一块几乎耗费相同的计算量。当然,这个模拟器背后是大量权重,它们通过互联网上的海量文本训练而成,最终得到这样一种模拟器。

And because it is trained on humans, it's got this emergent psychology that is human-like. So the first thing you'll notice is, of course, LLMs have encyclopedic knowledge and memory.
由于训练语料来自人类文本,它产生了类人的“涌现心理”。首先你会发现,LLM 拥有百科全书般的知识和记忆。

And they can remember lots of things, a lot more than any single individual human can because they've read so many things. It actually kind of reminds me of this movie, Rain Man, which I actually really recommend people watch.
它们能够记住大量信息,比任何单个人类都多,因为它们“阅读”了海量内容。这让我想起电影《雨人》,我非常推荐大家去看。

It's an amazing movie. I love this movie. And Dustin Hoffman here is an autistic savant who has almost perfect memory. So he can read like a phone book and remember all of the names and phone numbers.
这是一部惊艳的电影,我非常喜欢。影片中的达斯汀·霍夫曼饰演自闭症学者,他几乎拥有完美记忆——能看一遍电话簿就记住所有姓名和号码。

And I kind of feel like LLMs are kind of like very similar. They can remember SHA hashes and lots of different kinds of things very, very easily.
我觉得 LLM 与此十分相似,它们可以轻而易举记住 SHA 哈希值以及各种各样的信息。

So they certainly have superpowers in some respects, but they also have a bunch of, I would say, cognitive deficits.
因此,它们在某些方面确实具备超能力,但也存在不少“认知缺陷”。

So they hallucinate quite a bit and they kind of make up stuff and don't have a very good sort of internal model of self-knowledge, not sufficient at least. And this has gotten better, but not perfect. They display jagged intelligence.
例如,它们经常“幻觉”,胡编乱造,对自我知识的内部模型并不完善(至少不够完善)。这一点有所改善,但仍非完美。它们的智能表现参差不齐。

So they're going to be superhuman in some problem-solving domains. And then they're going to make mistakes that basically no human will make.
它们在某些解题领域超越人类,却又会犯下人类绝不会犯的错误。

Like, you know, they will insist that 9.11 is greater than 9.9 or that there are two Rs in strawberry. These are some famous examples. But basically there are rough edges that you can trip on. So that's kind of, I think, also kind of unique.
比如坚称 9.11 大于 9.9,或者说 strawberry 里有两个字母 R——这些都是著名例子。总之,它们有很多“毛刺”会让人踩坑,这也算是它们独特的一面。

They also kind of suffer from anterograde amnesia. So, and I think I'm alluding to the fact that if you have a co-worker who joins your organization,
他们也有点像是患上前向性遗忘症。我想说的是,如果有一位同事刚加入你的组织,

this co-worker will over time learn your organization and they will understand and gain like a huge amount of context on the organization and they go home and they sleep and they consolidate knowledge and they develop expertise over time.
随着时间推移,这位同事会慢慢了解你的组织,吸收并积累大量上下文信息;他们回家、休息、巩固所学,久而久之就能变得驾轻就熟。

LLMs don't natively do this and this is not something that has really been solved in the R\&D of LLMs, I think.
大语言模型 (LLM) 并不会天然具备这种能力,这在 LLM 的研发中仍未得到真正解决。

And so context windows are really kind of like working memory and you have to sort of program the working memory quite directly because they don't just kind of like get smarter by default.
因此,所谓“上下文窗口”其实就是工作记忆;你必须直接去“编程”这块工作记忆,因为它们不会自己无缘无故就变得更聪明。

And I think a lot of people get tripped up by the analogies in this way. In popular culture, I recommend people watch these two movies, Memento and Fifty First States.
我想很多人正是被这种类比绊倒。在流行文化里,我推荐大家去看两部电影:《记忆碎片》和《初恋 50 次》。

In both of these movies, the protagonists, their weights are fixed and their context windows gets wiped every single morning. And it's really problematic to go to work or have relationships when this happens.
在这两部电影里,主人公的“权重”是固定的,每天早晨他们的“上下文窗口”都会被清零。这样一来,无论是工作还是维系关系都成了难题。

And this happens to all of us all the time. I guess one more thing I would point to is security kind of related limitations of the use of LLMs. So for example, LLMs are quite gullible. They are susceptible to prompt injection risks.
而这恰恰时时刻刻发生在 LLM 身上。我还想补充一点与安全相关的限制:LLM 很容易轻信输入,容易受到提示词注入攻击。

They might leak your data, et cetera. And there's many other considerations security related.
它们可能泄露你的数据,等等;安全层面的考量还有很多。

So basically long story short you have to load your you have to load your you have to simultaneously think through this superhuman thing that has a bunch of cognitive deficits and issues how do we and yet they are extremely like useful and so how do we program them and how do we work around their deficits and enjoy their superhuman powers.
长话短说,你得在使用时“加载”这些模型,并同时考虑到它们具备超能力却也存在种种认知缺陷:我们如何编程、如何绕开缺陷、又如何充分发挥它们的超人能力?

So what I want to switch to now is talk about the opportunities of how do we use these models and where are some of the biggest opportunities.
接下来我想转换话题,讨论如何利用这些模型以及它们最大的机遇在哪里。

This is not a comprehensive list, just some of the things that I thought were interesting for this talk. The first thing I'm kind of excited about is what I would call partial autonomy apps.
以下并非完整清单,只是我觉得有趣的一些方向。首先令我兴奋的是所谓“部分自治应用”。

So for example, let's work with the example of coding. You can certainly go to ChatGPT directly and you can start copy pasting code around and copy pasting bug reports and stuff around and getting code and copy pasting everything around.
举例来说,以编程场景为例:你当然可以直接去用 ChatGPT,复制粘贴代码、粘贴 Bug 报告、拿到代码再复制粘贴回 IDE。

Why would you do that? Why would you go directly to the operating system? It makes a lot more sense to have an app dedicated for this. And so I think many of you use Cursor. I do as well. Cursor is kind of like the thing you want instead.
可为什么要这么做?为什么要直接面对“操作系统”?更合理的做法是使用专门的应用。因此,很多人(包括我)用 Cursor——这才是更合适的工具。

You don't want to just directly go through the ChatGPT. And I think Cursor is a very good example of an early LLM app that has a bunch of properties that I think are useful across all the LLM apps. So in particular,
你并不想每次都直面 ChatGPT。我认为 Cursor 是早期 LLM 应用的一个优秀范例,它具备许多对所有 LLM 应用都很有价值的特性。尤其是,

you will notice that we have a traditional interface that allows a human to go in and do all the work manually just as before. But in addition to that, we now have this LLM integration that allows us to go in bigger chunks.
它保留了传统界面,让人类可以像以前一样手动完成所有工作;除此之外,还集成了 LLM,使我们可以“一大块一大块”地处理问题。

And so some of the properties of LLM apps that I think are shared and useful to point out. Number one, the LLMs basically do a ton of the context management. Number two, they orchestrate multiple calls to LLMs, right?
因此,值得指出的 LLM 应用通用特性包括:第一,LLM 负责大量上下文管理;第二,它们会编排对多个模型的调用。

So in the case of Cursor, there's under the hood embedding models for all your files, the actual chat models, models that apply diffs to the code, and this is all orchestrated for you.
拿 Cursor 举例:后台会对所有文件做嵌入处理,还有用于聊天的模型,以及对代码打补丁的模型——这一切都被自动编排。

A really big one that I think also maybe not fully appreciated always is application-specific GUI and the importance of it. Because you don't just want to talk to the operating system directly in text.
我认为还有一点常被忽视却非常重要:应用专属的 GUI。毕竟人们并不总想通过纯文本去和“操作系统”对话。

Text is very hard to read, interpret, understand and also like you don't want to take some of these actions natively in text.
纯文本既难阅读、难解析、难理解,而且你也不想直接通过文本去执行某些操作。

So it's much better to just see a diff as like red and green change and you can see what's being added or subtracted. It's much easier to just do command Y to accept or command N to reject. I shouldn't have to type it in text, right?
所以最好能直接看到 diff,用红绿高亮展示增删内容;按 Command-Y 接受、Command-N 拒绝就行了。我不该还要再手动键入文本,对吧?

So GUI allows a human to audit the work of these fallible systems and to go faster. I'm going to come back to this point a little bit later as well.
因此 GUI 让人类可以审核这些易犯错系统的输出,并且提升效率。我稍后还会回到这个观点。

And the last kind of feature I want to point out is that there's what I call the autonomy slider. So, for example, in Cursor, you can just do tab completion. You're mostly in charge.
我想指出的最后一个特性是所谓的“自治滑杆”。比如在 Cursor 里,你可以只用 Tab 补全——主要还是你掌控全局。

You can select a chunk of code and Command-K to change just that chunk of code. You can do Command-L to change the entire file. Or you can do Command-I, which just, you know, let it rip, do whatever you want in the entire repo.
你可以选中一段代码按 Command-K 只改这一段,也可以按 Command-L 改整个文件;或者干脆按 Command-I,让模型在整个仓库里随意发挥。

And that's the sort of full autonomy agentic version. And so you are in charge of the autonomy slider. And depending on the complexity of the task at hand, you can tune the amount of autonomy that you're willing to give up for that task.
那就是完全自治、代理化的模式。你来控制这根自治滑杆,根据任务复杂度决定愿意放权到什么程度。

Maybe to show one more example of a fairly successful LLM app, Perplexity. It also has very similar features to what I just pointed out in Cursor. It packages up a lot of the information. It orchestrates multiple LLMs.
再举一个成功的 LLM 应用——Perplexity。它和 Cursor 有着非常相似的特性:把大量信息打包,并调度多种 LLM。

It's got a GUI that allows you to audit some of its work. So for example, it will cite sources and you can imagine inspecting them. And it's got an autonomy slider.
它提供 GUI 供你审核其输出,例如列出引用来源供你检查;同时也有自治滑杆。

You can either just do a quick search or you can do research or you can do deep research and come back 10 minutes later. So this is all just varying levels of autonomy that you give up to the tool.
你可以仅做快速搜索,也可以做一般研究,或者深度研究后 10 分钟再回来——这反映了你向工具让渡的不同自治级别。

So I guess my question is, I feel like a lot of software will become partially autonomous. And I'm trying to think through like, what does that look like? And for many of you who maintain products and services,
所以我的问题是:我感觉大量软件都会变成“部分自治”。那会是什么样?对在座许多维护产品和服务的人来说,

how are you going to make your products and services partially autonomous? Can an LLM see everything that a human can see? Can an LLM act in all the ways that a human could act? And can humans supervise and stay in the loop of this activity?
你们将如何让自己的产品和服务实现部分自治?LLM 能看到人类能看到的一切吗?LLM 能以人类能够的所有方式行动吗?人类能否保持监督并持续参与?

Because again, these are fallible systems that aren't yet perfect. And what does a diff look like in Photoshop or something like that, you know? And also a lot of the traditional software right now,
毕竟这些系统尚不完美、容易出错。那么在 Photoshop 这类场景里,diff 应该怎么呈现?此外,许多传统软件如今

it has all these switches and all this kind of stuff. It's all designed for human. All of this has to change and become accessible to LLMs.
拥有各种开关和设置,完全为人设计;这一切都得改变,使 LLM 能够访问和操控。

So one thing I want to stress with a lot of these LLM apps that I'm not sure gets as much attention as it should is we're now kind of like cooperating with AIs and usually they are doing the generation and we as humans are doing the verification.
我想强调的一点是,许多 LLM 应用的核心在于:我们正与 AI 协同工作——通常 AI 负责生成,而人类负责验证;这一点并未得到足够重视。

It is in our interest to make this loop go as fast as possible so we're getting a lot of work done. There are two major ways that I think this can be done. Number one, you can speed up verification a lot.
我们有必要让这条循环尽可能快,以完成更多工作。我认为有两大途径:第一,大幅加快验证环节。

And I think GUIs, for example, are extremely important to this because a GUI utilizes your computer vision GPU in all of our head. Reading text is effortful and it's not fun,
GUI 在此极其重要,因为 GUI 可调用我们每个人大脑里的“计算机视觉 GPU”。阅读纯文本既费力又不有趣,

but looking at stuff is fun and it's just kind of like a highway to your brain. So I think GUIs are very useful for auditing systems and visual representations in general. And number two, I would say, is We have to keep the AI on the leash.
但看图是快乐的,信息能直达大脑高速通道。因此 GUI 对于审核系统、做可视化都非常有用。第二点是:我们必须给 AI 系上“缰绳”。

I think a lot of people are getting way overexcited with AI agents. And it's not useful to me to get a diff of 1,000 lines of code to my repo. I'm still the bottleneck, right?
我觉得很多人对 AI 代理过于兴奋。一次性向我的仓库提交 1000 行 diff 对我并没用,我仍是瓶颈,对吧?

Even though that 1,000 lines come out instantly, I have to make sure that this thing is not introducing bugs. And that it's doing the correct thing, right? And that there's no security issues and so on.
即便这 1000 行代码瞬间生成,我仍需确保它没引入 Bug、逻辑正确、没有安全隐患等等。

So I think that Yeah, basically, we have to sort of like, it's in our interest to make the flow of these two go very, very fast and we have to somehow keep the AI on the leash because it gets way too overreactive. It's kind of like this.
所以,我认为是的,本质上,我们需要让这两步循环尽可能快,同时必须给 AI 系上“缰绳”,因为它过于积极反应,大概就像这样。

This is how I feel when I do AI-assisted coding. If I'm just vibe coding, everything is nice and great, but if I'm actually trying to get work done, it's not so great to have an overreactive agent doing all this kind of stuff.
这就是我在 AI 辅助编程时的感受:如果只是随便写点代码,感觉一切都挺好;但当我真正想完成工作时,一个过度活跃的代理会让我很头大。

So, this slide is not very good, I'm sorry, but I guess I'm trying to develop, like many of you, some ways of utilizing these agents in my coding workflow and to do AI-assisted coding.
这一页做得不太好,抱歉。不过,就像在座各位一样,我正在摸索把这些代理整合进自己的编程流程,开展 AI 辅助编程。

And in my own work, I'm always scared to get way too big diffs. I always go in small incremental chunks. I want to make sure that everything is good. I want to spin this loop very, very fast.
在我的实践中,我最怕 diff 过大。我习惯小步快跑,确保一切正常,让验证-生成循环尽可能快。

And I sort of work on small chunks of single concrete thing. And so I think many of you probably are developing similar ways of working with LLMs.
我总是聚焦于一个个具体的小任务。我想你们很多人也在探索与 LLM 协作的类似方法。

I also saw a number of blog posts that try to develop these best practices for working with LLMs. And here's one that I read recently and I thought was quite good.
我也读到不少博客,总结了与 LLM 协作的最佳实践。最近读到的一篇就写得很好。

And it kind of discussed some techniques and some of them have to do with how you keep the AI on the leash. And so as an example, if you are prompting, if your prompt is vague, then the AI might not do exactly what you wanted.
文中谈到一些技巧,其中几条就是如何“拴住”AI。例如,如果你的提示太模糊,AI 可能无法按你真正想要的去做。

And in that case, verification will fail. You're going to ask for something else. If a verification fails, then you're going to start spinning. So it makes a lot more sense to spend a bit more time to be more concrete in your prompts,
那样一来,验证就会失败,你得另行请求;验证失败就会陷入反复。因此,与其事后兜圈子,不如事先多花点时间,把提示写得更具体。

which increases the probability of successful verification and you can move forward. And so I think a lot of us are going to end up finding techniques like this. I think in my own work as well,
这会提高一次性通过验证的概率,让你继续推进。我想我们都会形成类似的技巧;我自己的工作中也是如此。

I'm currently interested in what education looks like in together with kind of like now that we have AI and LLMs, what does education look like? And I think a large amount of thought for me goes into how we keep AI on the leash.
我现在很好奇,有了 AI、LLM 之后,教育会变成什么样。我花了很多心思思考如何给 AI 套上“缰绳”。

I don't think it just works to go to ChatGPT and be like, hey, teach me physics. I don't think this works because the AI is like gets lost in the woods. And so for me, this is actually two separate apps.
我不认为直接对 ChatGPT 说“教我物理”就能奏效,因为 AI 很容易在“森林”里迷路。对我而言,这实际上需要两个独立的应用。

For example, there's an app for a teacher that creates courses. And then there's an app that takes courses and serves them to students. And in both cases, we now have this intermediate artifact of a course that is auditable,
比如,一个应用面向老师,用来制作课程;另一个应用把课程呈现给学生。两种场景里都有一个可审计的中间产物——课程,

and we can make sure it's good. We can make sure it's consistent. And the AI is kept on the leash with respect to a certain syllabus, a certain progression of projects, and so on. And so this is one way of keeping the AI on the leash,
这样我们才能确保课程优质且一致。AI 会被限定在教学大纲、项目进度等“缰绳”内运行,这是把 AI 拴住的一种方法,

and I think has a much higher likelihood of working. And the AI is not getting lost in the woods. One more kind of analogy I wanted to sort of allude to is I'm no stranger to partial autonomy,
成功率更高,也不至于让 AI 迷失方向。我还想再举个类比:我对“部分自治”并不陌生,

and I've kind of worked on this, I think, for five years at Tesla. And this is also a partial autonomy product and shares a lot of the features. Like, for example, right there in the instrument panel is the GUI of the autopilot.
在特斯拉时我搞了约五年,这也是部分自治产品,并拥有许多相同特征。例如,仪表盘中就有自动驾驶的 GUI,

So it's showing me what the neural network sees and so on. And we have the autonomy slider, where over the course of my tenure there, we did more and more autonomous tasks for the user.
展示神经网络所“看到”的内容。我们也有自治滑杆,在我任期内,滑杆逐步调高,车辆替用户执行的自主任务越来越多。

And maybe the story that I wanted to tell very briefly is actually the first time I drove a self-driving vehicle was in 2013. And I had a friend who worked at Waymo and he offered to give me a drive around Palo Alto.
我想顺便分享个小故事:我第一次体验自动驾驶是在 2013 年。当时一个在 Waymo 的朋友带我在帕洛阿尔托兜了一圈。

I took this picture using Google Glass at the time. And many of you are so young that you might not even know what that is. But yeah, this was like all the rage at the time.
那时我用 Google Glass 拍下了这张照片。你们很多人可能太年轻,甚至不知道那是什么。但当年它可是风靡一时的。

And we got into this car and we went for about a 30-minute drive around Palo Alto,  highways, streets and so on. And this drive was perfect. There was zero interventions. And this was 2013, which is now 12 years ago.
我们坐上那辆车,在帕洛阿尔托的高速公路、城市街道等地行驶了大约 30 分钟。整个过程完美无缺,没有一次人工接管。那是 2013 年,也就是 12 年前。

And it kind of struck me because at the time when I had this perfect drive,  this perfect demo, I felt like Wow, self-driving is imminent because this just worked. This is incredible.
这让我大为震撼——当时那趟完美的试驾、完美的演示让我觉得,哇,自动驾驶要来了,因为它确实奏效了,简直不可思议。

But here we are 12 years later and we are still working on autonomy. We are still working on driving agents. And even now, we haven't actually fully solved the problem. You may see Waymos going around and they look driverless,
然而 12 年过去了,我们仍在攻克自动驾驶,仍在研发驾驶代理。即便到今天,这个问题依旧没有彻底解决。你或许会看到 Waymo 无人车在路上行驶,看上去没有司机,

but there's still a lot of teleoperation and a lot of human in the loop of a lot of this driving. So we still haven't even declared success, but I think it's definitely going to succeed at this point,  but it just took a long time.
可背后依然有大量远程操控和人工干预。所以我们还远未宣布成功,但我相信最终一定会成功,只是过程非常漫长。

And so I think software is really tricky, I think, in the same way that driving is tricky. And so when I see things like 2025 is the year of agents,  I get very concerned and I kind of feel like,
因此我认为,软件和驾驶一样棘手。当我看到诸如“2025 是智能代理元年”之类的说法时,我会非常担忧,并且感觉——

This is the decade of agents and this is going to be quite some time. We need humans in the loop. We need to do this carefully. This is software. Let's be serious here. One more kind of analogy that I always think through is the Iron Man suit.
这是“智能代理的十年”,且还将持续相当长的时间。我们必须把人置于闭环之中,谨慎推进。这毕竟是软件,必须严肃以待。我经常想到的另一个类比是钢铁侠战衣。

I always love Iron Man. I think it's so correct in a bunch of ways with respect to technology and how it will play out.
我一直很喜欢《钢铁侠》。在我看来,它在技术及其未来走向上有很多地方都非常贴切。

And what I love about the Iron Man suit is that it's both an augmentation and Tony Stark can drive it And it's also an agent.
我喜欢钢铁侠战衣的一点在于:它既是对人的增强装置,托尼·斯塔克也能亲自驾驶,同时它本身又是一个智能代理。

And in some of the movies, the Iron Man suit is quite autonomous and can fly around and find Tony and all this kind of stuff. And so this is the autonomy slider is we can build augmentations or we can build agents.
在某些电影里,钢铁侠战衣具有相当高的自主性,能自己飞行并找到托尼等。因此,这就像一根自治滑杆:我们既可以做增强装置,也可以做智能代理。

And we kind of want to do a bit of both. But at this stage, I would say working with fallible LLMs and so on,  I would say it's less Iron Man robots and more Iron Man suits that you want to build.
而我们多少想两者兼顾。但在目前这个阶段,面对仍会出错的大模型,我认为大家更应该打造“钢铁侠战衣”而非“钢铁侠机器人”。

It's less like building flashy demos of autonomous agents and more building partial autonomy products. And these products have custom GUIs and UI UX.
与其开发华而不实的全自动演示代理,不如做部分自治的产品,而这些产品需要定制化的 GUI 和交互体验。

And we're trying to, and this is done so that the generation verification loop of the human is very,  very fast. But we are not losing the sight of the fact that it is in principle possible to automate this work.
我们这样做,是为了让人类的“生成-验证”循环尽可能迅速;同时也牢记,这些工作在原理上终究可以被完全自动化。

And there should be an autonomy slider in your product. And you should be thinking about how you can slide that autonomy slider and make your product sort of more autonomous over time.
你的产品里应该有一根“自治滑杆”,并思考如何逐步调高它,让产品随着时间推移变得更加自主。

But this is kind of how I think there's lots of opportunities in these kinds of products. I want to now switch gears a little bit and talk about one other dimension that I think is very unique.
我认为这类产品中蕴含大量机会。现在我想稍微转换一下话题,谈谈另一个我认为非常独特的维度。

Not only is there a new type of programming language that allows for autonomy in software,  but also, as I mentioned, it's programmed in English, which is this natural interface.
如今不仅出现了一种全新的编程语言,使软件能够自治,而且正如我之前所说,它是用英语编程——也就是天然的交互接口。

And suddenly everyone is a programmer because everyone speaks natural language like English. So this is extremely bullish and very interesting to me and also completely unprecedented, I would say.
于是,人人都成了程序员,因为人人都会说自然语言(如英语)。这让我既看好又兴奋,也可谓史无前例。

It used to be the case that you need to spend five to ten years studying something to be able to do something in software. This is not the case anymore. So I don't know if by any chance anyone has heard of wipe coding.
过去要想做成一件软件上的事,你得花五到十年专门学习;如今不再如此。不知道大家是否听说过“wipe coding”这个说法。

This is the tweet that kind of like introduced this,  but I'm told that this is now like a major meme.
正是这条推文引入了该词,如今它据说已经成了一个流行梗。

Fun story about this is that I've been on Twitter for like 15 years or something like that at this point and I still have no clue which tweet will become viral and which tweet like fizzles and no one cares.
有趣的是,我在推特上混了大概十五年,到现在还看不出哪条推文会火、哪条会石沉大海。

And I thought that this tweet was going to be the latter. I don't know, it was just like a shower of thoughts. But this became like a total meme and I really just can't tell.
我原以为那条推文会属于后者,只是我随手发的思绪罢了;结果它却彻底火成了梗,真是难以预料。

But I guess like it struck a chord and gave a name to something that everyone was feeling but couldn't quite say in words. So now there's a Wikipedia page and everything. This is like.
但我想,它击中了人们的心声,并给大家说不出口的感觉起了个名字。现在甚至连维基百科条目都出现了。就像是这样。

Yeah, this is like a major contribution now or something like that, so. So Tom Wolfe from Hugging Face shared this beautiful video that I really love. These are kids vibe coding. And I find that this is such a wholesome video.
是的,这现在简直成了一项重要“贡献”之类的东西。所以 Hugging Face 的 Tom Wolfe 分享了一个我非常喜欢的精彩视频——一群孩子在 vibe coding。我觉得这个视频特别治愈。

Like, I love this video. Like, how can you look at this video and feel bad about the future? The future is great. I think this will end up being like a gateway drug to software development. I'm not a doomer about the future of the generation.
真的,我太喜欢这段视频了。看了它,你怎么可能对未来感到悲观?未来很美好。我觉得这会成为孩子们踏入软件开发的“入门药”。我并不唱衰下一代的未来。

And I think, yeah, I love this video. So, I tried byte coding a little bit as well because it's so fun. So,
我真的很爱这个视频。所以我也试着玩了下 byte coding,实在太有趣了。

byte coding is so great when you want to build something super-duper custom that doesn't appear to exist and you just want to wing it because it's a Saturday or something like that.
当你想随手做点市面上完全没有、超级定制化的东西时,byte coding 简直太爽,尤其是周六这种想随性发挥的时候。

So, I built this iOS app and I don't, I can't actually program in Swift, but I was really shocked that I was able to build like a super basic app and I'm not going to explain it, it's really dumb.
于是我做了个 iOS 应用,可我其实不会写 Swift。让我震惊的是,我竟然能用它做出一个非常基础的 App,虽然功能很傻,我就不细说了。

But I kind of like this was just like a day of work and this was running on my phone like later that day and I was like, wow, this is amazing.
但这只花了我一天时间,当天晚上就能在手机上跑起来,我当时心想:哇,太神奇了。

I didn't have to like read through Swift or like five days or something like that to like get started. I also back-coded this app called MenuGen. And this is live. You can try it in MenuGen.app.
我无需花五天阅读 Swift 文档就能上手。我还 back-coded 了一个叫 MenuGen 的应用,它已经上线,大家可以去 MenuGen.app 体验。

And I basically have this problem where I show up at a restaurant, I read through the menu, and I have no idea what any of the things are. And I need pictures. So this doesn't exist. So I was like, hey, I'm going to back-code it.
我经常去餐馆时读菜单却不知道那些菜是什么样,特别需要配图。但市面上没有这种工具,所以我想:好吧,我自己 back-code 一个。

So this is what it looks like. You go to MenuGen.app. And you take a picture of a menu. And then MenuGen generates the images. And everyone gets \$5 in credits for free when you sign up. And therefore, this is a major cost center in my life.
它就是这样用的:访问 MenuGen.app,给菜单拍张照,MenuGen 就生成对应菜品图片。注册还能免费获得 5 美元额度。所以,这成了我生活里的“大型成本中心”。

So, this is a negative revenue app for me right now. I've lost a huge amount of money on MenuGen. Okay. But the fascinating thing about MenuGen for me is that The code of the Vibe coding part,
也就是说,目前这款应用对我来说是负收益的,MenuGen 已经让我亏了不少钱。不过 MenuGen 最有趣的一点在于,它的 vibe coding 部分代码,

the code was actually the easy part of Vibe coding MenuGen. And most of it actually was when I tried to make it real so that you can actually have authentication and payments and the domain name and a versatile deployment.
写代码反而是最简单的。真正花时间的是把它做成可用产品:要加身份验证、支付、域名、灵活部署等等。

This was really hard and all of this was not code. All of this DevOps stuff was me in the browser clicking stuff. And this was extreme slog and took another week.
这真的很难,而且这些都不是代码。所有这些 DevOps 工作都是我在浏览器里点来点去完成的。这事极其繁琐,又拖了一整周。

So it was really fascinating that I had the MenuGen Basically demo working on my laptop in a few hours. And then it took me a week because I was trying to make it real. And the reason for this is this was just really annoying.
有意思的是,MenuGen 的演示版我几小时就能在笔记本上跑通,可为了把它做成真正可用的产品,却又花了一周时间,因为过程实在太烦人了。

So for example, if you try to add Google login to your web page,  I know this is very small, but just a huge amount of instructions of this clerk library telling me how to integrate this. And this is crazy.
举例来说,如果你想在网页里接入 Google 登录,虽然听起来是个小功能,但 clerk 库会给你一长串指导说明,这太疯狂了。

Like it's telling me, go to this URL, click on this dropdown, choose this,  go to this, and click on that. And it's like telling me what to do. Like a computer is telling me the actions I should be taking. Like, you do it. Why am I doing this?
它让我去这个链接,点那个下拉框,选这个,再去另一个页面再点那个——就像一台电脑在指挥我干活。明明电脑能做,为什么非要我来?

What the hell? I had to follow all these instructions. This was crazy. So I think the last part of my talk, therefore, focuses on, can we just build for agents? I don't want to do this work. Can agents do this? Thank you.
搞什么嘛?我得照着这些指令一步步来,太离谱了。所以我演讲的最后想探讨:我们能不能直接为代理来构建?我不想干这些活儿,让代理来行不行?谢谢。

Unknown Speaker: OK.
未知发言者:好的。

Andrej Karpathy: So roughly speaking, I think there's a new category of consumer and manipulator of digital information. It used to be just humans through GUIs or computers through APIs. And now we have a completely new thing.
Andrej Karpathy:大体来说,我认为出现了一类新的数字信息消费者与操作者。过去只有人类通过 GUI,或者计算机通过 API,而现在出现了全新的角色。

And agents are their computers, but they are human-like, kind of, right? They're people spirits. There's people spirits on the internet and they need to interact with our software infrastructure. What can we build for them? It's a new thing.
这些代理也是计算机,但又有点像人——像人的“灵魂”。互联网中存在这些“人形灵体”,它们需要与我们的软件基础设施交互。我们能为它们构建什么?这是一件新鲜事。

So as an example, you can have robots.txt on your domain and you can instruct or like advise,  I suppose, web crawlers on how to behave on your website. In the same way,
比如,你可以在域名下放一个 robots.txt,指示或建议网络爬虫在你的网站上该如何行为。同理,

you can have maybe lms.txt file which is just a simple markdown that's telling LLMs what this domain is about. And this is very readable to an LLM. If it had to instead get the HTML of your webpage and try to parse it,
你也可以放一个 lms.txt 文件,用简单的 Markdown 告诉大模型这个域名是干什么的;LLM 读它毫无压力。若让模型去抓网页 HTML 再解析,

this is very error prone and difficult and we'll screw it up and it's not going to work. So we can just directly speak to the LLM. It's worth it. A huge amount of documentation is currently written for people.
这既容易出错又麻烦,最后还可能失败。因此,我们可以直接“跟 LLM 说话”,这很值得。如今大量文档是写给人看的。

So you will see things like lists and bold and pictures. And this is not directly accessible by an LLM. So I see some of the services now are transitioning a lot of their docs to be specifically for LLMs.
因此,你会看到诸如列表、加粗文本、图片等内容,而这些格式并不能被大模型直接读取。于是我注意到,一些服务正在把大量文档迁移成专门面向 LLM 的版本。

So Vercel and Stripe, as an example, are early movers here. But there are a few more that I've seen already. And they offer their documentation in Markdown. Markdown is super easy for LLMs to understand. This is great.
比如 Vercel 和 Stripe 就是较早行动的公司,我还看到其他几家也在跟进。他们把文档改成了 Markdown 格式,而 Markdown 对大模型来说非常友好,这太棒了。

Maybe one simple example from my experience as well. Maybe some of you know 3Blue1Brown. He makes beautiful animation videos on YouTube. Yeah, I love this library so that he wrote Manon. And I wanted to make my own.
我自己的经历也有一个简单例子:大家可能知道 3Blue1Brown,他在 YouTube 做了很多漂亮的数学动画视频。我很喜欢他写的动画库 Manim,也想用它做些东西。

And there's extensive documentations on how to use Manin. And so I didn't want to actually read through it. So I copy pasted the whole thing to an LLM. And I described what I wanted. And it just worked out of the box.
Manim 有非常详细的使用文档,但我不想从头到尾读完,于是直接把整份文档复制给大模型,并说明我的需求,结果马上就跑通了。

Like LLM just bi-coded me an animation exactly what I wanted. And I was like, wow, this is amazing. So if we can make docs legible to LLMs, it's going to unlock a huge amount of kind of use.
大模型“即兴编码”出了正好符合我需求的动画,我惊呼太神奇了。如果文档能让 LLM 轻松阅读,就能解锁大量全新的用法。

And I think this is wonderful and should happen more. The other thing I wanted to point out is that you do unfortunately have to, it's not just about taking your docs and making them appear in Markdown. That's the easy part.
我认为这非常棒,应当更广泛推广。但需要指出的是,工作不只是把文档换成 Markdown 这么简单,这只是轻而易举的第一步。

We actually have to change the docs because anytime your docs say click, this is bad. An LLM will not be able to natively take this action right now.
我们得真正改写文档——只要出现“点击按钮”之类的说明,就会出问题,因为 LLM 目前无法原生执行“点击”这种操作。

So Brazil, for example, is replacing every occurrence of click with the equivalent curl command that your LLM agent could take on your behalf. And so I think this is very interesting.
例如 Vercel 正在把文档中所有“点击”操作改成等价的 curl 命令,方便 LLM 代理直接代为执行。我觉得这非常有意思。

And then, of course, there's a model context protocol from Anthropic. And this is also another way. It's a protocol of speaking directly to agents as this new consumer and manipulator of digital information.
此外,Anthropic 还提出了 Model Context Protocol,这是一种直接面向智能代理的交互协议,把代理视为新的数字信息消费者与操作者。

So I'm very bullish on these ideas. The other thing I really like is a number of little tools here and there that are helping ingest data in, like, very LLM-friendly formats. So, for example, when I go to a GitHub repo, like my NanoGPT repo,
我对这些想法非常看好。我还喜欢一些小工具,它们把数据转换成 LLM 极易读取的格式。例如访问 GitHub 仓库(比如我的 NanoGPT 仓库)时——

I can't feed this to an LLM and ask questions about it because it's, you know, this is a human interface on GitHub. So when you just change the URL from GitHub to Git and Jest,
直接把 GitHub 网页丢给 LLM 并询问行不通,因为页面是给人看的。但如果把网址里的 github 改成 gitenj ——

then this will actually concatenate all the files into a single giant text and it will create a directory structure, etc. And this is ready to be copy-pasted into your favorite LLM and you can do stuff.
工具就会把仓库里的所有文件拼成一段大文本,并附带目录结构,马上就能复制到你喜欢的 LLM 里进行操作。

Maybe even more dramatic example of this is DeepWiki,  where it's not just the raw content of these files,  This is from Devin, but also like they have Devin basically do analysis of the GitHub repo and Devin basically builds up a whole docs pages just for your repo and you can imagine that this is even more helpful to copy paste into your LLM.
更极端的例子也许是 DeepWiki。它不仅仅提供文件的原始内容——这些都出自 Devin——而且还让 Devin 自动分析整个 GitHub 仓库,为你的仓库生成完整的文档页面。可以想见,把这些内容复制进 LLM 会更加有用。

So I love all the little tools that basically where you just change the URL and it makes something accessible to an LLM. So this is all well and great and I think there should be a lot more of it.
因此我非常喜欢那些“只改个 URL 就能让资源对 LLM 可读”的小工具。这些都很棒,我认为此类工具应该更多才对。

One more note I wanted to make is that It is absolutely possible that in the future,  LLMs will be able to, this is not even future, this is today,  they'll be able to go around and they'll be able to click stuff and so on.
我还想补充一点:未来——甚至说不是未来,而是现在——LLM 完全可能自己到处浏览并执行点击等操作。

But I still think it's very worth Basically meeting LLM's halfway and making it easier for them to access all this information because this is still fairly expensive,  I would say, to use and a lot more difficult.
但我依旧认为,我们有必要“向 LLM 走近一半”,让它们更容易获取信息,因为让模型自己去点击、爬取既昂贵又复杂。

And so I do think that lots of software,  there will be a long tail where it won't adapt because these are not like live player repositories or digital infrastructure and we will need these tools.
我相信会有大量软件处于长尾地带,迟迟不会适配 LLM——它们并非“活跃”仓库或核心基础设施——这就需要上述工具来桥接。

But I think for everyone else, I think it's very worth meeting in some middle point. So I'm bullish on both, if that makes sense. So in summary, what an amazing time to get into the industry. We need to rewrite a ton of code.
至于其他场景,我认为把握好中间点最有价值。所以二者我都看好。总之,现在入行真是绝佳时机——我们得重写大量代码。

A ton of code will be written by professionals and by coders. These LLMs are kind of like utilities, kind of like fabs, but they're kind of especially like operating systems. But it's so early. It's like 1960s of operating systems.
海量代码将由专业人士和开发者撰写。LLM 有点像公用事业、有点像晶圆厂,但更像操作系统——而当下仍处于“操作系统的 1960 年代”早期阶段。

And I think a lot of the analogies cross over. And these LLMs are kind of like these fallible people spirits that we have to learn to work with. And in order to do that properly, we need to adjust our infrastructure towards it.
许多类比是相通的。LLM 像是不完美的“人形灵体”,我们必须学会与之协作;要做到这一点,就得让基础设施随之调整。

So when you're building these LLM apps,  I describe some of the ways of working effectively with these LLMs and some of the tools that make that kind of possible and how you can spin this loop very,  very quickly and basically create partial autonomy products And then, yeah,  a lot of code has to also be written for the agents more directly. But in any case, going back to the Iron Man suit analogy,
因此,在构建 LLM 应用时,我分享了若干高效协作方式和工具,说明如何快速迭代“生成-验证”环路,打造部分自治产品。当然,还需要大量专为代理编写的代码。回到“钢铁侠战衣”的类比——

I think what we'll see over the next decade,  roughly, is we're going to take the slider from left to right. And I'm very interesting. It's going to be very interesting to see what that looks like.
我认为未来十年里,我们会逐步把那根“自治滑杆”从左拨到右。我对即将呈现的景象充满兴趣——一定非常精彩。

And I can't wait to build it with all of you. Thank you.
我迫不及待想与各位一起打造这一切。谢谢大家。


    热门主题

      • Recent Articles

      • 2003-02-21 Warren Buffett's Letters to Berkshire Shareholders

        Refer To:《2003-02-21 Warren Buffett's Letters to Berkshire Shareholders》。 To the Shareholders of Berkshire Hathaway Inc.: Our gain in net worth during 2002 was $6.1 billion, which increased the per-share book value of both our Class A and Class B ...
      • 2004-02-27 Warren Buffett's Letters to Berkshire Shareholders

        Refer To:《2004-02-27 Warren Buffett's Letters to Berkshire Shareholders》。 To the Shareholders of Berkshire Hathaway Inc.: Our gain in net worth during 2003 was $13.6 billion, which increased the per-share book value of both our Class A and Class B ...
      • 2005-02-28 Warren Buffett's Letters to Berkshire Shareholders

        Refer To:《2005-02-28 Warren Buffett's Letters to Berkshire Shareholders》。 To the Shareholders of Berkshire Hathaway Inc.: Our gain in net worth during 2004 was $8.3 billion, which increased the per-share book value of both our Class A and Class B ...
      • 2006-02-28 Warren Buffett's Letters to Berkshire Shareholders

        Refer To:《2006-02-28 Warren Buffett's Letters to Berkshire Shareholders》。 To the Shareholders of Berkshire Hathaway Inc.: Our gain in net worth during 2005 was $5.6 billion, which increased the per-share book value of both our Class A and Class B ...
      • 2007-02-28 Warren Buffett's Letters to Berkshire Shareholders

        Refer To:《2007-02-28 Warren Buffett's Letters to Berkshire Shareholders》。 To the Shareholders of Berkshire Hathaway Inc.: Our gain in net worth during 2006 was $16.9 billion, which increased the per-share book value of both our Class A and Class B ...