2024-01-21 Andrej Karpathy.Self-driving as a case study for AGI

Refer To：《Self-driving as a case study for AGI》。

Sparked by progress in Large Language Models (LLMs), there’s a lot of chatter recently about AGI, its timelines, and what it might look like. Some of it is hopeful and optimistic, but a lot of it is fearful and doomy, to put it mildly. Unfortunately, a lot of it is also very abstract, which causes people to speak past each other in circles. Therefore, I’m always on a lookout for concrete analogies and historical precedents that can help explore the topic in more grounded terms. In particular, when I am asked about what I think AGI will look like, I personally like to point to self-driving. In this post, I’d like to explain why. Let’s start with one common definition of AGI:

受大型语言模型（LLM）进展的推动，最近关于AGI（通用人工智能）、其时间表以及可能形态的讨论此起彼伏。其中一些充满希望和乐观，但也有大量内容带着恐惧和末日情绪。遗憾的是，许多讨论也非常抽象，导致人们兜圈子、互相错过要点。因此，我总是在寻找能让我们以更接地气方式探讨这一主题的具体类比和历史先例。尤其是当别人问我认为AGI会是什么样时，我个人喜欢拿自动驾驶来举例。本文中，我想解释为什么。让我们先从一个常见的AGI定义开始：

AGI: An autonomous system that surpasses human capabilities in the majority of economically valuable work.

AGI：一种在大多数具有经济价值的工作中超越人类能力的自主系统。

Note that there are two specific requirements in this definition. First, it is a system that has full autonomy, i.e. it operates on its own with very little to no human supervision. Second, it operates autonomously across the majority of economically valuable work. To make this part concrete, I personally like to refer to U.S. Bureau of labor statistics index of occupations. A system that has both of these properties we would call an AGI.

请注意，此定义中有两个具体要求。第一，它必须是一个完全自主的系统，也就是说它几乎无需人类监督即可自行运行。第二，它在大多数具有经济价值的工作中能够自主运作。为了让这一点更具体，我个人喜欢参考美国劳工统计局的职业指数。满足这两个特性的系统，我们就称之为AGI。

经济价值本身是模糊的，如果服务的人不具备经济价值。

What I would like to suggest in this post is that recent developments in our ability to automate driving is a very good early case study of the societal dynamics of increasing automation, and by extension what AGI in general will look and feel like. I think this is because of a few features of this space that loosely just say that “it is a big deal”: Self-driving is very accessible and visible to society (cars with no drivers on the streets!), it is a large part of the economy by size, it presently employs a large human workforce (e.g. think Uber/Lyft drivers), and driving is a sufficiently difficult problem to automate, but automate it we did (ahead of many other sectors of the economy), and society has noticed and is responding to it. There are of course other industries that have also been dramatically automated, but either I am personally less familiar with them, or they fall short of some of the properties above.

在本文中，我想提出的观点是，最近我们在自动驾驶能力方面的进展是研究自动化不断增加所带来社会动态的一个极佳早期案例，进而也可以类比一般意义上AGI的样貌和感受。我认为原因在于该领域具备几项特征，简单来说就是“这事儿很大”：自动驾驶对社会而言非常直观和可见（街头出现无人驾驶汽车！）；它在经济中占有相当大比重；目前这一领域雇佣着大量人类劳动力（例如Uber/Lyft司机）；同时，驾驶是一个足够难以自动化的问题，但我们确实已经将其自动化了（领先于许多其他经济领域），并且社会已经察觉并开始做出响应。当然，还有其他行业也经历了剧烈的自动化，但要么我个人不太熟悉，要么它们不具备上述某些特性。

partial automation

部分自动化

As a “sufficiently difficult” problem in AI, automation of driving did not pop into existence out of nowhere; It is a result of a gradual process of automating the driving task, with a lot of “tool AI” intermediates. In vehicle autonomy, many cars are now manufactured with a “Level 2” driver assist - an AI that collaborates with a human to get from point A to point B. It is not fully autonomous but it handles a lot of the low-level details of driving. Sometimes it automates entire maneuvers, e.g. the car might park for you. The human primarily acts as the supervisor of this activity, but can in principle take over at any time and perform the driving task, or issue a high-level command (e.g. request a lane change). In some cases (e.g. lane following and quick decision making), the AI outperforms human capability, but it can still fall short of it in rare scenarios. This is analogous to a lot of tool AIs that we are starting to see deployed in other industries, especially with the recent capability unlock due to Large Language Models (LLMs). For example, as a programmer, when I use GitHub Copilot to auto-complete a block of code, or GPT-4 to write a bigger function, I am handing off low-level details to the automation, but in the exact same way, I can also step in with an “intervention” should the need arise. That is, Copilot and GPT-4 are Level 2 programming. There are many Level 2 automations across the industry, not all of them necessarily based on LLMs - from TurboTax, to robots in Amazon warehouses, to many other “tool AIs” in translation, writing, art, legal, marketing, etc.

作为人工智能中一个“足够困难”的问题，驾驶自动化并非凭空出现；它是通过逐步自动化驾驶任务并伴随大量“工具型AI”中间产物的结果。在车辆自动化领域，许多汽车现在出厂时配备“Level 2”驾驶辅助——一种与人类协作从A点到B点的AI。它并非完全自主，但能处理大量驾驶的低层细节。有时它可以自动执行整个操作，例如帮你泊车。人类主要充当此活动的监督者，但原则上可在任何时候接管并执行驾驶任务，或发出高层指令（例如请求变道）。在某些情况下（如车道保持和快速决策）AI 的表现优于人类，但在罕见情景下仍可能逊色。这类似于我们开始在其他行业部署的大量工具型AI，尤其是在大型语言模型（LLM）带来的能力突破之后。例如，作为一名程序员，当我使用GitHub Copilot 自动补全一段代码，或使用 GPT-4 编写更大函数时，我将低层细节交给自动化处理，但同样地，我也可以在需要时介入“干预”。也就是说，Copilot 和 GPT-4 相当于 Level 2 编程。行业中存在许多 Level 2 自动化，并非都基于 LLM——从 TurboTax，到亚马逊仓库里的机器人，再到翻译、写作、艺术、法律、市场营销等领域的许多其他“工具型AI”。

full automation

完全自动化

At some point, these systems cross the threshold of reliability and become what looks like Waymo today. They creep into the realm of full autonomy. In San Francisco today, you can open up an app and call a Waymo instead of an Uber. A driverless car will pull up and take you, a paying customer, to your destination. This is amazing. You need not know how to drive, you need not pay attention, you can lean back and take a nap, while the system transports you from A to B. Like many others I’ve talked to, I personally prefer to take a Waymo over Uber and I’ve switched to it almost exclusively for within-city transportation. You get a lot more low-variance, reproducible experience, the driving is smooth, you can play music, and you can chat with friends without spending mental resources thinking about what the driver is thinking listening to you.

在某个时点，这些系统跨过可靠性门槛，变成了如今的 Waymo。它们悄然进入完全自主的领域。如今在旧金山，你可以打开 App 叫一辆 Waymo，而不是 Uber。一辆无人驾驶汽车会驶来，把你——付费乘客——送到目的地。这太神奇了。你无需会开车，无需保持注意力，可以靠在座椅上小憩，而系统把你从 A 运送到 B。与我交谈过的许多人一样，我个人也更喜欢乘坐 Waymo 而不是 Uber，并且几乎在市内出行时都改用 Waymo。你将获得更低方差、更可复现的体验，驾驶平稳，可以放音乐，也可以与朋友畅聊，而无需分心去想司机听到你们谈话时在想什么。

the mixed economy of full automation

完全自动化的混合经济

And yet, even though autonomous driving technology now exists, there are still plenty of people calling an Uber alongside. How come? Well first, many people simply don’t even know that you can call a Waymo. But even if they do, many people don’t fully trust the automated system just yet and prefer to have a human drive them. But even if they did, many people might just prefer a human driver, and e.g. enjoy the talk and banter and getting to know other people. Beyond just preferences alone, judging by the increasing wait times in the app today, Waymo is supply constrained. There are not enough cars to meet the demand. A part of this may be that Waymo is being very careful to manage and monitor risk and public opinion. Another part is that Waymo, I believe (?), has a quota of how many cars they are allowed to have deployed on the streets, coming from regulators. Another rate-limiter is that Waymos can’t just replace all the Ubers right away in a snap of a finger. They have to build out the infrastructure, build the cars, scale their operations. I posit that all kinds of automations in other sectors of the economy will look identical - some people/companies will use them immediately, but a lot of people 1) won’t know about them, 2) if they do, won’t trust them, 3) if they did, they still prefer to employ and work with a human. But on top of that, demand is greater than supply and AGI would be constrained in exactly all of these ways, for exactly all of the same reasons - some amount of self-restraint from the developers, some amount of regulation, and some amount of simple, straight-up resource shortage, e.g. needing to build out more GPU datacenters.

然而，即便自动驾驶技术已经存在，仍有许多人同时在叫 Uber。为什么？首先，很多人甚至不知道你可以叫 Waymo。但即便知道，很多人对自动系统仍缺乏足够信任，更愿意让人类来驾驶。即使他们信任，也可能单纯偏好人类司机，例如享受交谈、互相了解的过程。仅仅考虑偏好之外，从今天应用里越来越长的等待时间来看，Waymo 受到供给约束。车辆数量无法满足需求。部分原因可能是 Waymo 在非常谨慎地管理和监控风险及公众舆论。另一部分原因是我认为（？）Waymo 在监管机构方面被限定了街上许可部署车辆的数量。还有一个限速因素是 Waymo 无法一瞬间替换掉所有 Uber 车辆。他们必须建设基础设施、制造汽车、扩张运营。我认为，经济中其他领域的各种自动化将呈现完全相同的面貌——一些人/公司会立即使用，但很多人 1）不知道它们的存在；2）即便知道，也不会信任；3）即便信任，仍偏好与人类共同工作。此外，需求大于供给，AGI 也会在这几方面受到同样的约束——开发者的自我克制、一定程度上的监管，以及简简单单的资源短缺，例如需要建设更多 GPU 数据中心。

少了人类自身的需求，阳台上种菜，几乎每个家里都还有厨房，等等。

the globalization of full automation

全面自动化的全球化

As I already hinted on with resource constraints, the full globalization of this technology is still very expensive, work-intensive, and rate-limiting. Today, Waymo can only drive in San Francisco and Phoenix, but the approach itself is fairly general and scalable, so the company might e.g. soon expand to LA, Austin or etc. The product may also still be constrained by other environmental factors, e.g. driving in heavy snow. And in some rare cases, it might even need rescue from a human operator. The expansion of capability does not come “for free”. For example, Waymo has to expend resources to enter a new city. They have to establish a presence, map the streets, adjust the perception and planner/controller to some unique situations, or to local rules or regulations specific to that area. In our working analogy, many jobs may have full autonomy only in some settings or conditions, and expanding the coverage will require work and effort. In both cases, the approach itself is general and scalable and the frontier will expand, but can only do so over time.

正如我之前在资源限制方面所暗示的那样，这项技术要想在全球范围内全面推广仍然成本高昂、劳动密集且受到速度限制。如今，Waymo 仅能在旧金山和凤凰城运行，但其方法本身相当通用且可扩展，因此公司可能很快扩展到洛杉矶、奥斯汀等地。该产品仍可能受到其他环境因素的制约，例如在大雪中驾驶。此外，在极少数情况下，它甚至可能需要人类操作员的救援。能力的扩张并非“免费”得来。例如，Waymo 若想进入新城市，必须投入资源。他们需要建立当地业务，绘制街道地图，并针对一些独特情境或当地特定法规，调整感知系统和规划/控制器。在我们的类比中，许多工作岗位只有在某些场景或条件下才能实现完全自治，而扩大覆盖范围需要持续投入和努力。在两种情况下，这种方法本身都是通用且可扩展的，前沿将不断拓展，但只能随着时间推移循序渐进。

少了大规模推广所需要的经济利益，经济利益少了就很难往前推进，汽车是个重资产投入的行业。

society reacts

社会的反应

Another aspect that I find fascinating about the ongoing introduction of self-driving to society is that just a few years ago, there was a ton of commentary and FUD everywhere about oh “will it”, “won’t it” work, is it even possible or not, and it was a whole thing. And now self-driving is actually here. Not as a research prototype but as a product – I can exchange money for fully automated transportation. In its present operating range, the industry has reached full autonomy. And yet, overall it’s almost like no one cares. Most people I talk to (even in tech!) don’t even know that this happened. When your Waymo is driving through the streets of SF, you’ll see many people look at it as an oddity. First they are surprised and stare. Then they seem to move on with their lives. When full autonomy gets introduced in other industries, maybe the world doesn’t just blow in a storm. The majority of people may not even realize it at first. When they do, they might stare and then shrug, in a way that ranges anywhere from denial to acceptance. Some people get really upset about it, and do the equivalent of putting cones on Waymos in protest, whatever the equivalent of that may be. Of course, we’ve come nowhere close to seeing this aspect fully play out just yet, but when it does I expect it to be broadly predictive.

自动驾驶逐步融入社会的另一个有趣之处在于，就在几年前，到处都是“能不能行”“会不会成功”“究竟可不可行”等评论和 FUD（恐惧、不确定与怀疑）。如今，自动驾驶真的来了——不是研究原型，而是可用商品——我可以付费获得全自动交通服务。在当前运营范围内，行业已实现完全自治。然而，总体上似乎几乎无人关心。我接触的大多数人（甚至是技术圈）都不知道这已经发生。当你的 Waymo 行驶在旧金山街头时，许多人会把它当成怪异事物：先是惊讶凝视，然后继续自己的生活。当完全自治被引入其他行业时，世界或许不会掀起惊涛骇浪。大多数人一开始甚至不会意识到；等到注意到时，他们可能只是盯一眼然后耸耸肩，从否认到接受不等。有人对此极为反感，做出类似在 Waymo 车顶放交通锥的抗议行为，换到别的行业也会出现对应举动。当然，我们离看到这一面彻底展开还很遥远，但一旦发生，我预计会具有广泛的示范意义。

等真正实现规模化了再看，电商的普及就是例子，大众最终接受的很好。

economic impact

经济影响

Let’s turn to jobs. Certainly, and visibly, Waymo has deleted the driver of the car. But it has also created a lot of other jobs that were not there before and are a lot less visible – the human labeler helping to collect training data for neural networks, the support agent who remotely connects to the vehicles that run into any trouble, the people building and maintaining the car fleet, the maps, etc. An entire new industry of various sensors and related infrastructure is created to assemble these highly-instrumented, high-tech cars in the first place. In the same way with work more generally, many jobs will change, some jobs will disappear, but many new jobs will appear, too. It is a lot more a refactoring of work instead of direct deletion, even if that deletion is the most prominent part. It’s hard to argue that the overall numbers won’t trend down at some point and over time, but this happens significantly slower than a person naively looking at the situation might think.

我们来谈谈就业。显而易见，Waymo 取消了“司机”这一岗位。但它也创造了许多之前不存在且不那么显眼的岗位——为神经网络采集训练数据的人类标注员、远程接入车辆以处理问题的支持人员、组装和维护车队及地图的工作人员等。为了打造这些高度仪器化的高科技汽车，各类传感器及相关基础设施催生了全新的产业。同样地，放眼更广泛的工作领域，很多岗位将发生变化，有些岗位会消失，但也会涌现许多新岗位。这更像是对工作的“重构”，而非单纯的“删除”，尽管删除部分最为显眼。总体岗位数量或许终将减少，但其速度远比外行直观想象的要慢得多。

存在自身认知的偏见，只看到自己想看到的一面，科技进步就像图片的清晰度，分辨率越高细节越多，可以做的事越多，聪明人有比白痴更多的工作也是一个类比。

competitive landscape

竞争格局

The final aspect I’d like to consider is the competitive landscape. A few years ago there were many, many self-driving car companies. Today, in recognition of the difficulty of this problem (which I think is only just barely possible to automate given the current state of the art in AI and computing more generally), the ecosystem has significantly consolidated and Waymo has reached the first feature-complete demonstration of the self-driving future. However, a number of companies are in pursuit, including e.g. Cruise, Zoox, and of course, my personal favorite :), Tesla. A brief note here given my specific history and involvement with this space. As I see it, the ultimate goal of the self-driving industry is to achieve full autonomy globally. Waymo has taken the strategy of first going for autonomy and then scaling globally, while Tesla has taken the strategy of first going globally and then scaling autonomy. Today, I am a happy customer of the products of both companies and, personally, I cheer for the technology overall first. However, one company has a lot of primarily software work remaining while the other has a lot of primarily hardware work remaining. I have my bets for which one goes faster. All that said, in the same way, many other sectors of the economy may go through a time of rapid growth and expansion (think \~2015 era of self-driving), but if the analogy holds, only to later consolidate into a small few companies battling it out. And in the midst of it all, there will be a lot of actively used Tool AIs (think: today’s Level 2 ADAS features), and even some open platforms (think: Comma).

最后要讨论的方面是竞争格局。几年前，自驾车公司林立。如今，鉴于这一难题的复杂性（我认为以目前的 AI 与计算水平，仅勉强可实现自动化），整个生态已大幅整合，Waymo 打造了首个功能完备的自驾示范。然而，仍有多家公司在追赶，例如 Cruise、Zoox，还有我个人偏爱的 Tesla。鉴于我在该领域的经历，这里简单说明：在我看来，自驾行业的终极目标是全球范围实现完全自治。Waymo 的策略是先实现自治，再进行全球扩张；Tesla 的策略是先全球扩张，再提升自治。目前，我是两家公司产品的满意用户，并且首先为整体技术进步喝彩。不过，两家公司各有短板：一家剩下大量软件工作，另一家则剩下大量硬件工作。我已下注哪家能更快完成。话虽如此，其他行业也可能经历快速扩张（可回想 2015 年左右的自驾热潮），若类比成立，最终只会整合成少数企业争雄。而在此过程中，将会有大量被积极使用的工具型 AI（例如当今的 Level 2 ADAS 功能），甚至一些开放平台（如 Comma）。

没有问过人类有没有这方面的需求？没有问过自己小便时要不要机器人掏出自己的鸡巴，这个事的发生频率一般高于每天驾驶出行的频率，是不是更应该自动化。

AGI

So these are the broad strokes of what I think AGI will look like. Now just copy paste this across the economy in your mind, happening at different rates, and with all kinds of difficult to predict interactions and second order effects. I don’t expect it to hold perfectly, but I expect it to be a useful model to have in mind and to draw on. On a kind of memetic spectrum, it looks a lot less like a recursively self-improving superintelligence that escapes our control into cyberspace to manufacture deadly pathogens or nanobots that turn the galaxy into gray goo. And it looks a lot more like self-driving, the part of our economy that is currently speed-running the development of a major, society-altering automation. It has a gradual progression, it has the society as an observer and a participant, and its expansion is rate-limited in a large variety of ways, including regulation and resources of an educated human workforce, information, material, and energy. The world doesn’t explode, it adapts, changes and refactors. In self-driving specifically, the automation of transportation will make it a lot safer, cities will become a lot less smoggy and congested, and parking lots and parked cars will disappear from the sides of our roads to make more space for people. I personally very much look forward to what all the equivalents of that might be with AGI.

以上便是我对 AGI 可能形态的整体勾勒。请在脑海中将此情景复制并粘贴到经济的各个领域——它们会以不同速度发生，伴随难以预测的交互与二阶效应。我并不期待模型完美适配，但认为它是值得参考的思维框架。在“模因光谱”上，它远不像一个递归自我提升、脱离我们控制、在网络空间制造致命病原体或纳米机器、将银河系化为灰色泥浆的超级智能，而更像自动驾驶——当前经济中正以极速推进、改变社会结构的重大自动化进程。它逐步演化，社会既是观察者也是参与者，其扩张受多方面限制，包括监管、受教育劳动力、信息、物质与能源。世界不会突然爆炸，而会适应、改变并重构。就自动驾驶而言，交通自动化将显著提升安全性，使城市更少雾霾与拥堵，道路两侧的停车场与停放车辆将消失，为人们腾出更多空间。我个人非常期待 AGI 带来的所有相应改变。

2024-01-21 Andrej Karpathy.Self-driving as a case study for AGI

2024-01-21 Andrej Karpathy.Self-driving as a case study for AGI

partial automation

full automation

the mixed economy of full automation

the globalization of full automation

society reacts

economic impact

competitive landscape

AGI

热门主题

Recent Articles

2003-02-21 Warren Buffett's Letters to Berkshire Shareholders

2004-02-27 Warren Buffett's Letters to Berkshire Shareholders

2005-02-28 Warren Buffett's Letters to Berkshire Shareholders

2006-02-28 Warren Buffett's Letters to Berkshire Shareholders

2007-02-28 Warren Buffett's Letters to Berkshire Shareholders