How Llama 4 compares to other models
Llama 4 相较于其他模型的表现如何
Dwarkesh Patel
Mark, thanks for coming on the podcast again.
马克,感谢你再次参加播客节目。
Mark Zuckerberg
Yeah, happy to do it. Good to see you.
很高兴来参加节目,见到你很开心。
Dwarkesh Patel
You too. Last time you were here, you had launched Llama 3. Now you've launched Llama 4.
我也是。你上次来时刚发布了 Llama 3。现在你已经发布了 Llama 4。
Mark Zuckerberg
Well, the first version.
嗯,目前是 Llama 4 的第一版。
Dwarkesh Patel
That's right. What's new? What's exciting? What's changed?
没错。那么有哪些新变化?哪些让人兴奋?发生了哪些不同?
Mark Zuckerberg
The whole field is so dynamic. I feel like a ton has changed since the last time we talked. Meta AI has almost a billion people using it monthly now, which is pretty wild. I think this is going to be a really big year for all of this, especially once you get the personalization loop going, which we’re just starting to build in now really, from both the context that all the algorithms have about what you’re interested in — feed, your profile information, your social graph information — but also what you're interacting with the AI about. That’s going to be the next thing that's super exciting. I'm really big on that.
这个领域变化太快了。我觉得从我们上次谈话到现在,已经发生了很多事情。Meta AI 现在每月有接近 10 亿人使用,这太疯狂了。我认为今年对整个行业来说都将是关键的一年,尤其是一旦个性化闭环建立起来——这正是我们现在开始投入建设的东西,包括算法对于你的兴趣、信息流、个人资料和社交图谱的理解,还有你跟 AI 互动的内容。这将是接下来最令人兴奋的事情。我对这点非常看好。

跟马化腾的说法一样。
The modeling stuff continues to make really impressive advances too. I'm pretty happy with the first set of Llama 4 releases. We announced four models and released the first two — the Scout and Maverick ones — which are mid-size to small models.
在模型方面,我们也持续取得了令人印象深刻的进展。我对 Llama 4 的第一批发布结果感到非常满意。我们一共发布了四个模型,并率先推出了两个——Scout 和 Maverick,它们是中小型模型。
The most popular Llama 3 model was the 8 billion parameter one. So we’ve got one of those coming in the Llama 4 series too. Our internal code name for it is “Little Llama.” That’s coming probably over the next few months.
Llama 3 中最受欢迎的模型是参数量为 80 亿的那个。所以在 Llama 4 系列中,我们也将推出类似的版本,内部代号叫“小羊驼”(Little Llama)。它可能会在接下来的几个月中发布。
Scout and Maverick are good. They have some of the highest intelligence per cost you can get of any model out there. They’re natively multimodal, very efficient, run on one host. They’re designed to be very efficient and low latency, for a lot of the use cases we’re building for internally. That’s our whole thing. We build what we want, and then we open-source it so other people can use it too. I'm excited about that.
Scout 和 Maverick 表现不错,它们具备当今模型中每单位成本智力水平最高的之一。它们原生就是多模态的,非常高效,可以在单机上运行。我们专门为内部的各种使用场景而优化设计它们,以实现高效和低延迟。这就是我们的一贯做法:我们先为自己的需求打造产品,然后开源出来供其他人使用。我对此感到非常兴奋。

神经病,不是用户关心的事。
I'm also excited about the Behemoth model, which is coming up. It's going to be our first model that's sort of at the frontier — more than 2 trillion parameters. As the name says, it's quite big. We’re trying to figure out how to make that useful for people. It’s so big that we've had to build a bunch of infrastructure just to be able to post-train it ourselves.
我还很期待接下来的 Behemoth 模型。这将是我们第一个真正处于技术前沿的模型——参数量超过 2 万亿。顾名思义,这模型非常庞大。我们正在努力弄清楚如何让它对人们真正有用。它太大了,我们甚至不得不专门搭建一整套基础设施,才能完成它的后训练。
Now we're trying to wrap our heads around, how does the average developer out there actually use something like this? How do we make it useful — maybe by distilling it into models that are a reasonable size to run? Because you're obviously not going to want to run something like that in a consumer model.
现在我们要思考的是,普通开发者该如何使用这么大的模型?我们怎么让它变得实用?或许我们可以通过蒸馏,把它转化为更适合部署的小模型。因为显然,没有人会想在终端设备上运行如此庞大的模型。
As you saw with the Llama 3 stuff last year, the initial launch was exciting and then we just built on that over the year. 3.1 released the 405 billion model, 3.2 is when we got all the multimodal stuff in. We basically have a roadmap like that for this year too. So a lot going on.
正如你去年看到 Llama 3 的节奏一样,最初发布令人兴奋,然后我们在整年里持续迭代。Llama 3.1 推出了 4050 亿参数的模型,3.2 实现了全套多模态能力。今年我们也有类似的路线图。所以,今年也将非常忙碌、值得期待。
Dwarkesh Patel
I'm interested to hear more about it. There's this impression that the gap between the best closed-source and the best open-source models has increased over the last year. I know the full family of Llama 4 models isn't out yet, but Llama 4 Maverick is at #35 on Chatbot Arena. On a bunch of major benchmarks, it seems like o4-mini or Gemini 2.5 Flash are beating Maverick, which is in the same class. What do you make of that impression?
我很想听听你的看法。过去一年里,似乎出现了一种印象:最好的闭源模型与最好的开源模型之间的差距正在扩大。我知道Llama 4系列还没有全部发布,但目前Llama 4 Maverick在Chatbot Arena上排第35名。从许多主要的基准测试来看,像o4-mini或者Gemini 2.5 Flash似乎都比Maverick表现更好,而它们应该是同一档次的模型。你怎么看这种现象?
Mark Zuckerberg
There are a few things. First, I actually think this has been a very good year for open source overall. If you go back to where we were last year, Llama was the only real, super-innovative open-source model. Now you have a bunch of them in the field.
我认为有几个方面值得讨论。首先,就整体而言,今年对开源来说是非常好的一年。如果回顾去年,当时Llama几乎是唯一真正具有高度创新性的开源模型。而现在,已经有不少优秀的开源模型问世了。
In general, the prediction that this would be the year open source generally overtakes closed source as the most used models out there, I think that's generally on track to be true.
总体来看,之前有预测说今年开源模型将在使用上全面超越闭源模型,我认为这个趋势大致是正确的。

缺少逻辑的胡说。
One interesting surprise — positive in some ways, negative in others, but overall good — is that it’s not just Llama. There are a lot of good ones out there. I think that's quite good.
一个有趣的现象——某些方面是积极的,某些方面也许不那么积极,但整体来看是好事——就是现在不仅仅有Llama,还有很多其他优秀的开源模型。我认为这是非常好的发展。
Then there's the reasoning phenomenon, which you're alluding to talking about o3, o4, and other models. There's a specialization happening. If you want a model that’s the best at math problems, coding, or different things like those tasks, then reasoning models that consume more test-time or inference-time compute in order to provide more intelligence are a really compelling paradigm. And we're building a Llama 4 reasoning model too. It'll come out at some point.
你提到o3、o4等模型时,涉及到了一个推理能力的现象。这其实显示出一种专业化趋势。如果你希望模型在数学问题、编程或者类似任务上表现最佳,那些在推理时消耗更多计算资源以换取更高智能的模型,是一种非常有吸引力的路径。我们也正在打造Llama 4的推理版本,未来会推出。
But for a lot of the applications we care about, latency and good intelligence per cost are much more important product attributes. If you're primarily designing for a consumer product, people don't want to wait half a minute to get an answer. If you can give them a generally good answer in half a second, that's a great tradeoff.
但对于我们关注的许多应用来说,延迟和单位成本下的智能水平,是更重要的产品属性。如果你主要是为消费类产品设计,用户不会想等半分钟才收到回复。如果你能在半秒钟内给出一个普遍不错的答案,那是一个非常划算的权衡。
I think both of these are going to end up being important directions. I’m optimistic about integrating reasoning models with the core language models over time. That's the direction Google has gone in with some of the more recent Gemini models. I think that's really promising. But I think there’s just going to be a bunch of different stuff that goes on.
我认为这两个方向最终都会非常重要。我对将推理模型与核心语言模型逐步融合持乐观态度。这也是Google最近一些Gemini模型所采取的方向。我觉得这个路径非常有前景。但我认为未来会有很多不同的发展路线同时展开。

想法上、技术上都不是Open AI和Google的对手。
You also mentioned the whole Chatbot Arena thing, which I think is interesting and points to the challenge around how you do benchmarking. How do you know what models are good for which things?
你也提到了Chatbot Arena,我觉得这非常有趣,也暴露出做基准评估时面临的挑战。你如何知道哪种模型在哪方面表现最好?
One of the things we've generally tried to do over the last year is anchor more of our models in our Meta AI product north star use cases. The issue with open source benchmarks, and any given thing like the LM Arena stuff, is that they’re often skewed toward a very specific set of uses cases, which are often not actually what any normal person does in your product. The portfolio of things they’re trying to measure is often different from what people care about in any given product.
我们过去一年的努力方向之一,是将我们模型的目标更多地锚定在Meta AI产品中的“北极星”使用场景上。开源基准测试,像LM Arena之类,往往倾向于某些非常特定的用例,但那些往往并不是用户在我们产品中真正会做的事情。它们所试图衡量的能力组合,往往和人们在实际产品中关心的并不一致。
Because of that, we’ve found that trying to optimize too much for that kind of stuff has led us astray. It’s actually not led towards the highest quality product, the most usage, and best feedback within Meta AI as people use our stuff.
因此我们发现,如果过于追求这类指标,会让我们偏离方向。它实际上不会带来最高质量的产品,也不会带来最多的使用或最好的用户反馈。
So we're trying to anchor our north star on the product value that people report to us, what they say that they want, and what their revealed preferences are, and using the experiences that we have. Sometimes these benchmarks just don't quite line up. I think a lot of them are quite easily gameable.
所以我们现在尝试以用户反馈给我们的产品价值作为北极星,关注他们说自己想要什么、实际表现出的偏好,并结合我们已有的使用数据。有时候这些基准测试结果并不能很好匹配实际情况。我认为其中很多都很容易被“刷分”。
On the Arena you'll see stuff like Sonnet 3.7, which is a great model, and it's not near the top. It was relatively easy for our team to tune a version of Llama 4 Maverick that could be way at the top. But the version we released, the pure model, actually has no tuning for that at all, so it's further down. So you just need to be careful with some of these benchmarks. We're going to index primarily on the products.
你在Arena上能看到像Sonnet 3.7这样其实很好的模型,却并没有排在前列。我们团队其实很容易就能调出一个版本的Llama 4 Maverick,让它登上榜首。但我们发布的是一个完全未针对Arena做调优的“纯”模型,所以排名靠后。因此,对于这些基准测试结果,你得保持警惕。我们会主要以产品表现为评价指标。
Dwarkesh Patel
Do you feel like there is some benchmark which captures what you see as a north star of value to the user which can be be objectively measured between different models and where you'd say, "I need Llama 4 to come out on top on this”?
你是否认为存在某种基准测试,能够衡量你所说的“用户价值北极星”?这种基准可以在不同模型之间进行客观比较,而你会说:“我需要Llama 4在这个基准上表现最好”?
Mark Zuckerberg
Our benchmark is basically user value in Meta AI.
我们的基准基本上就是Meta AI中体现出来的用户价值。
Dwarkesh Patel
But you can't compare that to other models.
但你无法将这个标准与其他模型进行比较。
Mark Zuckerberg
We might be able to, because we might be able to run other models and be able to tell. That's one of the advantages of open source. You have a good community of folks who can poke holes in your stuff and point out, "Okay, where is your model not good, and where is it good?"
我们或许能做到,因为我们有可能运行其他模型并进行比较。这是开源的一个优势。你会有一个优秀的社区,他们能帮你找出模型的漏洞,指出:“看,这里你的模型做得不好,那边又挺好。”
The reality at this point is that all these models are optimized for slightly different mixes of things. Everyone is trying to go towards the same end in that all the leading labs are trying to create general intelligence, superintelligence, whatever you call it. AI that can lead toward a world of abundance where everyone has these superhuman tools to create whatever they want. That leads to dramatically empowering people and creating all these economic benefits.
现实是,如今这些模型都在为略有不同的目标组合进行优化。尽管最终大家都朝着同一个方向努力,即各大顶尖实验室都在尝试打造通用智能、超级智能,或你怎么称呼它都行——一种能通往丰裕世界的AI,让每个人都拥有超人般的工具去创造他们想要的一切。这将极大地赋能用户,并带来巨大的经济效益。
However you define it, that's what a lot of the labs are going for. But there's no doubt that different folks have optimized toward different things. I think the Anthropic folks have really focused on coding and agents around that. The OpenAI folks, I think, have gone a little more toward reasoning recently.
不管你怎么定义,这确实是许多实验室正在追求的目标。但毫无疑问,不同团队在优化方向上还是各有侧重。我认为Anthropic的团队非常注重代码生成和智能体相关的能力;而OpenAI则近来更偏向推理方面的提升。
There’s a space which, if I had to guess, I think will end up being the most used one: quick, very natural to interact with, natively multimodal, fitting throughout your day in the ways you want to interact with it.
有一个方向——如果让我猜——我认为最终会成为使用最广泛的:快速、非常自然的交互、原生多模态,并能在你日常生活中以各种方式无缝融入互动。

需要及时回应的家庭妇女,以及经常逛夜店的客人对小姐的要求。
I think you got a chance to play around with the new Meta AI app that we're releasing. One of the fun things we put in there is the demo for the full-duplex voice. It's early. There’s a reason why we haven't made that the default voice model in the app yet. But there's something about how naturally conversational it is that's really fun and compelling.
我猜你可能已经体验过我们即将推出的新Meta AI应用。我们在里面放了一个非常有趣的功能演示,就是全双工语音。这还处在早期阶段,也因此我们还没把它作为默认语音模型放进应用。但它那种极其自然的对话感,真的很有趣、很有吸引力。
Being able to mix that in with the right personalization is going to lead toward a product experience where… If you fast-forward a few years, I think we're just going to be talking to AI throughout the day about different things we're wondering about.
如果能将它与恰当的个性化结合起来,就会带来一种未来产品体验……快进几年,我认为我们会整天都在跟AI交谈,讨论各种我们想知道的事。
You'll have your phone. You'll talk to it while browsing your feed apps. It'll give you context about different stuff. It'll answer your questions. It'll help you as you're interacting with people in messaging apps. Eventually, I think we'll walk through our daily lives and have glasses or other kinds of AI devices and just seamlessly interact with it all day long.
你会拿着手机,一边浏览信息流,一边与AI对话。它会为你提供各种事物的背景信息,回答你的问题,在你使用聊天应用与他人交流时提供帮助。最终,我认为我们日常生活中会戴着眼镜或其他AI设备,整天与AI无缝互动。
That’s the north star. Whatever the benchmarks are that lead toward people feeling like the quality is where they want to interact with it, that's what will ultimately matter the most to us.
这就是我们的北极星。无论什么样的基准,只要能让人们觉得AI的质量足以值得互动,那就是我们最看重的。

单就人工智能的竞赛,Meta获胜的可能性非常小。
Intelligence explosion
智能爆炸
Dwarkesh Patel
I got a chance to play around with both Orion and also the Meta AI app, and the voice mode was super smooth. It was quite impressive.
我有机会体验了Orion和Meta AI应用,语音模式流畅得惊人,真的很令人印象深刻。
On the point of what the different labs are optimizing for — to steelman their view — I think a lot of them believe that once you fully automate software engineering and AI research, then you can kick off an intelligence explosion. You would have millions of copies of these software engineers replicating the research that happened between Llama 1 and Llama 4 — that scale of improvement again — but in a matter of weeks or months rather than years. So it really matters to just close the loop on the software engineer, and then you can be the first to ASI. What do you make of that?
关于不同实验室的优化方向——如果我要以最强观点重构他们的立场——我认为许多人相信,一旦你完全实现软件工程和AI研究的自动化,就能引发智能爆炸。你会拥有数百万个AI“软件工程师”,重复从Llama 1到Llama 4那样的进步规模,但时间尺度从几年缩短为几周或几个月。所以,关键在于把“AI工程师环”闭合,然后你就可能成为第一个实现ASI(人工超级智能)的人。你怎么看这种观点?
Mark Zuckerberg
I personally think that's pretty compelling. That's why we have a big coding effort too. We're working on a number of coding agents inside Meta. Because we're not really an enterprise software company, we're primarily building it for ourselves.
我个人认为这个观点非常有说服力。这也是为什么我们在编码方向投入巨大精力。我们在Meta内部研发了多个编码智能体。由于我们并不是一家企业软件公司,我们主要是为自己开发这些工具。
Again, we go for a specific goal. We're not trying to build a general developer tool. We're trying to build a coding agent and an AI research agent that advances Llama research specifically. And it's fully plugged into our toolchain and all that.
我们依然是以明确目标为导向。我们并不是要做通用的开发者工具,而是要构建专门推动Llama研究进展的编码智能体和AI研究智能体。这些工具完全整合进我们的工具链体系中。
That's important and is going to end up being an important part of how this stuff gets done. I would guess that sometime in the next 12 to 18 months, we'll reach the point where most of the code that's going toward these efforts is written by AI. And I don't mean autocomplete.
这很重要,并且最终会成为实现这一切的关键部分。我估计在未来12到18个月里,我们将迎来一个转折点:这些AI项目中的大部分代码将由AI编写。而我说的不是自动补全。

参与竞赛的各方已经形成共识,Meta看着还没有资格。
Today you have good autocomplete. You start writing something and it can complete a section of code. I'm talking more like: you give it a goal, it can run tests, it can improve things, it can find issues, it writes higher quality code than the average very good person on the team already. I think that's going to be a really important part of this for sure.
如今的补全工具已经很不错了:你开始写,它能帮你补全一段代码。但我说的是更进一步的能力:你给它一个目标,它能自己运行测试、优化内容、发现问题,并且写出比团队中多数优秀工程师更高质量的代码。我认为这无疑会是整个进程中至关重要的一环。
But I don't know if that's the whole game. That's going to be a big industry, and it's going to be an important part of how AI gets developed. But I think there are still… One way to think about it is that this is a massive space. I don't think there's just going to be one company with one optimization function that serves everyone as best as possible. There are going to be a bunch of different labs doing leading work in different domains. Some will be more enterprise-focused or coding-focused. Some will be more productivity-focused. Some will be more social or entertainment-focused.
但我不认为这就是全部。这当然会成为一个巨大的产业,并成为AI发展中的关键一环。但我认为,整个空间依然非常庞大。不能指望会有哪家公司用一个优化函数就能服务所有人。未来一定会有许多实验室在不同领域各自领跑。有些专注企业或编程,有些注重生产力,还有一些会更关注社交或娱乐。
Within the assistant space, there will be some that are more informational and productivity-focused, and some that are more companion-focused. It’s going to be a lot of stuff that’s just fun and entertaining and shows up in your feed.
在助手领域内部,也会有些更偏向信息获取和生产力,有些更偏向陪伴型互动。未来还会出现很多纯粹好玩、娱乐性的内容,出现在你的信息流里。
There's just a huge amount of space. Part of what's fun about going toward this AGI future is that there are a bunch of common threads for what needs to get invented, but also a lot of things that still need to be created. I think you're going to start seeing more specialization between different groups, if I had to guess.
这个领域的空间非常广阔。迈向AGI的过程之所以令人兴奋,在于既有一系列大家共同需要攻克的难题,又有大量尚待开创的新事物。如果要我预测,我觉得我们将看到不同团队之间更明确的专业化分工。
Dwarkesh Patel
It’s really interesting to me that you basically agree with the premise that there will be an intelligence explosion and we’ll get something like superintelligence on the other end. Tell me if I'm misunderstanding you. If that’s the case, why even bother with personal assistants and whatever else? Why not just get to superhuman intelligence first and then deal with everything else later?
让我觉得很有趣的是——如果我理解没错——你基本上认同“智能爆炸”这个前提,并认为我们最终会实现类似超级智能的存在。如果真是这样,那为什么还要费力去做个人助手和这些产品?为什么不先直奔超人智能,然后再来解决其他问题?
Mark Zuckerberg
I think that's just one aspect of the flywheel. Part of what I generally disagree with on the fast-takeoff view is that it takes time to build out physical infrastructure.
我认为这只是飞轮效应的一部分。我对“快速起飞”观点的一大保留是:构建物理基础设施是需要时间的。
If you want to build a gigawatt cluster of compute, that just takes time. NVIDIA needs time to stabilize their new generation of systems. Then you need to figure out the networking around it. Then you need to build the building. You need to get permitting. You need to get the energy. Maybe that means gas turbines or green energy, either way, there’s a whole supply chain of that stuff.
如果你想建一个吉瓦级的算力集群,那就是一个耗时的过程。英伟达需要时间来稳定新一代系统。你还得解决网络架构问题,建造机房,申请许可,获取能源。能源可能来自燃气轮机,也可能是绿色能源,无论哪种方式,这背后都有一整条复杂的供应链。
We talked about this a bunch the last time I was on the podcast with you. I think some of these are just physical-world, human-time things. As you start getting more intelligence in one part of the stack, you’re just going to run into a different set of bottlenecks. That’s how engineering always works: solve one bottleneck, you get another bottleneck.
我们上次播客里就聊过这些。我认为有些问题就是现实世界、人类时间尺度上的东西。当你在技术栈的一部分实现更高智能后,你会立刻遭遇新的瓶颈。这就是工程的规律:解决一个瓶颈,下一个瓶颈就冒出来。
Another bottleneck in the system or ingredient that’s going to make this work well, is people getting used to learning and having a feedback loop with using the system. These systems don’t just show up fully formed with people magically knowing how to use them. There's a co-evolution that happens where people are learning how to best use these AI assistants. At the same time, the AI assistants are learning what people care about. Developers are making the AI assistants better.
另一个决定系统是否真正奏效的瓶颈,是人们如何学习、适应,并与系统建立反馈回路。这些系统不会“完形而现”,然后人们就奇迹般地知道怎么用它们。它需要一场共同进化:人类在学习如何更好地使用AI助手,AI助手在学习人们在乎什么,开发者则持续优化AI助手。
You're building up a base of context too. You wake up a year or two into it and the assistant can reference things you talked about two years ago and that’s pretty cool. You couldn’t do that even if you launched the perfect thing on day one. There’s no way it could reference what you talked about two years ago if it didn’t exist two years ago.
同时你也在积累上下文信息。一两年后你会发现,助手能引用你两年前聊过的事情,那就很酷。即使你第一天就发布了“完美产品”,它也不可能回顾两年前的内容——因为它当时根本还不存在。
So I guess my view is that there's this huge intelligence growth. There’s a very rapid curve on the uptake of people interacting with the AI assistants, and the learning feedback and data flywheel around that. And then there is also the buildout of the supply chains and infrastructure and regulatory frameworks to enable the scaling of a lot of the physical infrastructure. At some level, all of those are going to be necessary, not just the coding piece.
所以我的观点是:智能本身会快速增长,AI助手的使用曲线会迅速上升,围绕它的学习反馈和数据飞轮也会飞快运转。但同时,物理基础设施、供应链、监管框架也都需要逐步搭建和扩展。某种程度上,这些方面都同样必不可少,而不仅仅是“让AI会写代码”那么简单。
One specific example of this that I think is interesting. Even if you go back a few years ago, we had a project, I think it was on our ads team, to automate ranking experiments. That's a pretty constrained environment. It's not open-ended code. It’s basically, look at the whole history of the company — every experiment that any engineer has ever done in the ad system — and look at what worked, what didn't, and what the results of those were. Then basically formulate new hypotheses for different tests that we should run that could improve the performance of the ad system.
我可以举个我觉得挺有意思的例子。哪怕追溯到几年前,我们广告团队就有一个项目,目标是自动化排序实验。那个场景其实很受限,它不是开放式编程任务。基本上是:回顾公司整个广告系统的实验历史,看看哪些有效、哪些无效,各自效果如何。然后据此生成新的测试假设,用于进一步提升广告系统性能。
What we basically found was that we were bottlenecked on compute to run tests, based on the number of hypotheses. It turns out, even with just the humans we have right now on the ads team, we already have more good ideas to test than we actually have either compute or, really, cohorts of people to test them with.
我们发现的结果是:我们在测试中遇到的瓶颈,其实是计算资源。根据生成的假设数量来看,即使只有广告团队现有的人工资源,我们提出的好点子已经多到超出了可用的计算资源,甚至测试用户群的容量。
Even if you have three and a half billion people using your products, you still want each test to be statistically significant. It needs to have hundreds of thousands or millions of people. There's only so much throughput you can get on testing through that. So we're already at the point, even with just the people we have, that we can't really test everything that we want.
哪怕你的产品有35亿用户,你仍然希望每一个测试都具备统计意义。你需要数十万甚至上百万的样本。能做A/B测试的带宽是有限的。所以我们早已处于“即使仅凭现有团队,我们也没法测试所有想法”的状态。
Now just being able to test more things is not necessarily going to be additive to that. We need to get to the point where the average quality of the hypotheses that the AI is generating is better than all the things above the line that we’re actually able to test that the best humans on the team have been able to do, before it will even be marginally useful for it.
而且,光是能测试更多内容本身未必有增益。我们真正需要的是:AI所提出的假设,其平均质量要超过目前我们人工团队中最好的人所能提出、并已投入测试的假设,只有到那个程度,AI生成的内容才会真正有边际价值。

回避了问题,但关于洞察力的说法很好。
We'll get there I think pretty quickly. But it's not just, “Okay, cool, the thing can write code, and now all of a sudden everything is just improving massively.” There are real-world constraints that need to be overcome.
我觉得我们会很快达到那个点。但这不是说,“哇,AI会写代码了,一切瞬间就飞跃发展”。现实世界中还有许多限制必须被克服。
Then you need to have the compute and the people to test. Then over time, as the quality creeps up, are we here in five or 10 years where no set of people can generate a hypothesis as good as the AI system? I don't know, maybe. In that world, obviously that's going to be how all the value is created. But that's not the first step.
你还得有算力资源和测试人群。然后随着AI输出质量逐步提高,也许五年或十年后,我们真的会进入一个阶段:没有任何人类团队能提出比AI更好的假设。在那个世界里,当然,价值就会主要由AI创造。但那不是第一步。
Dwarkesh Patel
So if you buy this view, that this is where intelligence is headed, the reason to be bullish on Meta is obviously that you have all this distribution. You can also use that to learn more things that can be useful for training. You mentioned the Meta AI app now has a billion active users.
如果你认同智能发展的这个方向,那么看好Meta的理由显然在于你拥有庞大的分发渠道。你也可以借此获取更多可用于训练的数据。你刚提到Meta AI应用现在有10亿活跃用户。
Mark Zuckerberg
Not the app. The app is a standalone thing that we're just launching now. It’ll be fun for people who want to use it. It's a cool experience. We can talk about that too because we’re experimenting with some new ideas in there that I think are novel and worth talking through.
不是那个应用本身。我们刚刚推出的Meta AI App是一个独立的应用,供愿意尝试的用户使用,它是个挺酷的体验。我们也可以聊聊它,因为我们在里面尝试了一些新想法,我觉得很新颖、值得探讨。
But I’m mostly talking about our apps. Meta AI is actually most used in WhatsApp. WhatsApp is mostly used outside of the U.S. We just passed like a hundred million people in the US, but it's not the primary messaging system in the US, iMessage is. So people in the U.S. probably tend to underestimate Meta AI usage somewhat. But part of the reason the standalone app is going to be so important is because the US, for a lot of reasons, is one of the most important countries. And the fact that WhatsApp is the main way people are using Meta AI and that's not the main messaging system in the US means we need another way to build a first-class experience that's really in front of people.
我说的是我们整个应用生态。Meta AI其实在WhatsApp里使用最广。WhatsApp主要用户在美国以外,我们在美国刚过一亿用户,但美国的主流通讯工具是iMessage,所以美国用户可能会低估Meta AI的使用量。独立App之所以重要,是因为美国在很多方面都是最关键的国家之一。而现在Meta AI主要通过WhatsApp使用,而WhatsApp并不是美国的主力通信平台,这就意味着我们需要另一种方式来打造真正出现在用户面前的一流体验。
Dwarkesh Patel
And I guess, to finish the question, the bearish case would be that if the future of AI is less about just answering your questions and more about being a virtual coworker, then it's not clear how Meta AI inside of WhatsApp gives you the relevant training data to make a fully autonomous programmer or remote worker. In that case, does it not matter that much who has more distribution right now with LLMs?
那我补完这个问题:反方观点是,如果AI的未来不是回答问题,而是成为虚拟协作伙伴,那么在WhatsApp中嵌入Meta AI,可能无法提供训练完全自主编程或远程工作AI所需的相关数据。在这种情况下,谁现在拥有更大LLM分发量似乎就没那么重要了,对吗?
Mark Zuckerberg
Again, I just think there are going to be different things. Imagine you were sitting at the beginning of the development of the internet and you asked, "What's going to be the main internet thing? Is it going to be knowledge work or massive consumer apps?"
我还是认为,会有很多不同的应用方向。你可以想象自己处在互联网发展初期,你会问:“互联网的主要用途是知识工作,还是大规模消费类应用?”
You got both. You don’t have to choose one. The world is big and complicated. Does one company build all of that stuff? Normally the answer is no.
结果我们得到了两者,你不需要二选一。世界很大也很复杂。是否由一家公司开发所有这些东西?通常答案是否定的。
But to your question, people do not code in WhatsApp for the most part. And I don't foresee that people starting to write code in WhatsApp is going to be a major use case. Although I do think people are going to ask AI to do a lot of things that result in the AI coding without them necessarily knowing it. That's a separate thing.
具体到你提的情况,大多数人不会在WhatsApp里写代码,我也不认为未来会有大量人在WhatsApp中写代码。不过我确实认为人们会要求AI完成很多事情,而AI在背后通过编程来实现这些,而用户未必意识到这点。这是另一回事。
We do have a lot of people who are writing code at Meta and they use Meta AI. We have this internal thing called MetaMate, and a number of different coding and AI research agents that we're building around that. That has its own feedback loop and I think it can get quite good for accelerating those efforts.
我们在Meta内部有很多工程师写代码,他们确实在使用Meta AI。我们有一个叫MetaMate的内部工具,以及围绕它构建的多个编码与AI研究智能体。它有自己的反馈回路,我认为在加速这些领域的开发上会变得非常出色。
But again, there are going to be a lot of things. AI is almost certainly going to unlock a massive revolution in knowledge work and code. I also think it’s going to be the next generation of search and how people get information, and do more complex information tasks.
但再次强调,AI会在很多方面爆发潜能。它几乎可以肯定会彻底变革知识工作和编程。我也认为它会成为下一代的搜索方式、信息获取方式,以及帮助人们完成更复杂的信息任务。
I also think it's going to be fun. People are going to use it to be entertained. A lot of the internet today is memes and humor. We have this amazing technology at our fingertips. It’s amazing and funny when you think about how much of human energy just goes toward entertaining ourselves, designing, pushing culture forward, and finding humorous ways to explain cultural phenomena that we observe. I think that's almost certainly going to be the case in the future.
我还认为AI会变得很好玩。人们会用它来娱乐。今天的互联网其实很大一部分是表情包和幽默内容。我们手中掌握着这种令人惊叹的技术,想想看有多有趣:人类投入了多少能量只为娱乐自己、设计文化、推动潮流、用幽默的方式表达我们观察到的社会现象。我相信未来一定也会是这样。
Look at the evolution of things like Instagram and Facebook. If you go back 10, 15, 20 years ago, it was text. Then we all got phones with cameras, and most of the content became photos. Then the mobile networks got good enough that if you wanted to watch a video on your phone, it wasn't just buffering the whole time. So that got good.
看看Instagram和Facebook的发展历程。10、15、20年前,内容还是以文字为主。后来大家都用上了带摄像头的手机,于是大部分内容变成了图片。再后来,移动网络的速度提升了,人们开始可以顺畅地看视频,不再缓冲卡顿,视频内容就崛起了。
Over the last 10 years, most of the content has moved toward video at this point. Today, most of the time spent on Facebook and Instagram is on video. But do you think in five years we’re just going to be sitting in our feed and consuming media that's just video? No, it's going to be interactive. You'll be scrolling through your feed. There will be content that maybe looks like a Reel to start. But you can talk to it, or interact with it, and it talks back, or it changes what it's doing. Or you can jump into it like a game and interact with it. That's all going to be AI.
过去十年,内容几乎都转向了视频。如今在Facebook和Instagram上,用户的大部分时间都花在看视频上。但你认为五年后我们还会只坐在那里刷视频吗?不会,那将是互动内容。你浏览信息流,看到类似Reels的内容,但你可以和它对话、互动,它会回应你,甚至改变它的行为。你还可以像玩游戏一样进入其中互动。这些都会是AI驱动的。

手机很可能还是主流,多数人的生存受到物理环境的限制。
My point is that there are going to be all these different things. We're ambitious, so we're working on a bunch of them. But I don't think any one company is going to do all of it.
我的观点是,这些方向都会存在。我们很有野心,所以同时在多个领域推进。但我不认为任何一家公司能做完所有事情。
AI Friends, Therapists & Girlfriend
AI 朋友、治疗师和女友
Dwarkesh Patel
On this point about AI-generated content and AI interactions, already people have meaningful relationships with AI therapists, AI friends, maybe more. This is just going to get more intense as these AIs become more unique, more personable, more intelligent, more spontaneous, more funny, and so forth.
关于AI生成内容与AI互动这点,人们已经在和AI治疗师、AI朋友等建立有意义的关系。随着这些AI变得更加独特、有个性、更聪明、更自然、更幽默,这种趋势只会愈演愈烈。
People are going to have relationships with AI. How do we make sure these are healthy relationships?
人们将会和AI建立关系。我们该如何确保这些关系是健康的?
Mark Zuckerberg
There are a lot of questions that you only can really answer as you start seeing the behaviors. Probably the most important upfront thing is just to ask that question and care about it at each step along the way. But I also think being too prescriptive upfront and saying, "We think these things are not good" often cuts off value.
这个问题,很多答案只有在观察到实际行为之后才能真正回应。最重要的,是从一开始就问这个问题,并在每一个阶段都持续关注。但如果你一开始就设限、说“我们觉得这不好”,往往会扼杀掉很多潜在价值。

字节是相类似的方法论,白痴之间的交互很难预测。
People use stuff that's valuable for them. One of my core guiding principles in designing products is that people are smart. They know what's valuable in their lives. Every once in a while, something bad happens in a product and you want to make sure you design your product well to minimize that.
人们会使用对自己有价值的东西。我设计产品的核心指导原则之一是:人是聪明的,他们知道什么对自己的生活有价值。当然,有时产品中会出现一些负面情况,你需要确保产品设计得足够好,从而尽可能减少这类情况发生。
But if you think something someone is doing is bad and they think it's really valuable, most of the time in my experience, they're right and you're wrong. You just haven't come up with the framework yet for understanding why the thing they're doing is valuable and helpful in their life. That's the main way I think about it.
但如果你觉得某人正在做的某件事不好,而他们觉得那对他们来说非常有价值,我的经验是,大多数时候是他们对,你错了。只是你还没有建立一个框架来理解这件事为何对他们有价值、有帮助。这是我看待这些问题的基本思路。
I do think people are going to use AI for a lot of these social tasks. Already, one of the main things we see people using Meta AI for is talking through difficult conversations they need to have with people in their lives. "I'm having this issue with my girlfriend. Help me have this conversation.” Or, "I need to have a hard conversation with my boss at work. How do I have that conversation?" That's pretty helpful. As the personalization loop kicks in and the AI starts to get to know you better and better, that will just be really compelling.
我确实认为人们会将AI用于大量社交场景。目前我们看到Meta AI一个主要用途就是:帮助用户处理与身边人的困难对话。比如:“我和女朋友有问题了,帮我怎么开口。”或者,“我得和老板谈个棘手的问题,怎么说合适?”这非常有帮助。随着个性化反馈机制生效,AI越来越了解你,这个过程将变得非常吸引人。
Here’s one stat from working on social media for a long time that I always think is crazy. The average American has fewer than three friends, fewer than three people they would consider friends. And the average person has demand for meaningfully more. I think it's something like 15 friends or something. At some point you're like, "All right, I'm just too busy, I can't deal with more people."
我长期从事社交媒体工作时得出一个我始终觉得很震惊的数据:美国人平均只有不到3个他们认为是朋友的人。而普通人对朋友的期望远不止于此,我记得是大约15个。到某个点,你可能会想:“好吧,我太忙了,没法处理更多人际关系。”
But the average person wants more connection than they have. There's a lot of concern people raise like, "Is this going to replace real-world, physical, in-person connections?" And my default is that the answer to that is probably not. There are all these things that are better about physical connections when you can have them. But the reality is that people just don't have as much connection as they want. They feel more alone a lot of the time than they would like.
但大多数人确实渴望比现在更多的人际连接。很多人担心,“这些AI关系会不会取代真实的、面对面的关系?”而我的默认答案是:可能不会。现实中的互动有很多优势,这是毫无疑问的。但问题是,大多数人并没有他们渴望的那种社交连接。他们比自己希望的更加孤独。
So I think a lot of these things — things that today might have a little bit of stigma around them — over time, we'll find the vocabulary as a society to articulate why they are valuable, why the people who are doing them are rational for doing it, and how it is actually adding value to their lives.
所以我认为,今天可能还有些污名的这些事情,随着时间推移,社会终将找到恰当的话语体系,去表达这些事物的价值,说明那些从事这些行为的人其实是理性且有理由的,并解释这些行为如何确实为他们的生活带来了价值。
But also the field is very early. There are a handful of companies doing virtual therapists, virtual girlfriend-type stuff. But it's very early. The embodiment in those things is still pretty weak. You open it up and it's just an image of the therapist or the person you're talking to. Sometimes there's some very rough animation, but it's not an embodiment.
当然,这一领域仍处于非常早期阶段。目前也只有少数公司在做虚拟治疗师、虚拟女友之类的产品。但这还是早期,拟人化做得还不够。你打开应用,看到的只是一个人的形象,有时甚至是粗糙的动画,根本谈不上真实化。
You've seen the stuff we're working on in Reality Labs, where you have the Codec Avatars and it actually feels like a real person. That's where it's going. You'll be able to have an always-on video chat with the AI. The gestures are important too. More than half of communication, when you're actually having a conversation, is not the words you speak. It's all the nonverbal stuff.
你应该见过我们在Reality Labs中开发的Codec Avatar,那些头像真的有种“真人感”。未来的方向就是这个——你可以和AI进行全天候的视频对话。手势也很重要,因为在真实交流中,超过一半的信息不是通过语言传递的,而是靠非语言的表达方式完成的。
Dwarkesh Patel
I did get a chance to check out Orion the other day, and I thought it was super impressive. I'm mostly optimistic about the technology. Generally, like you mentioned, I'm pretty libertarian about this. If people are doing something, they probably think it's good for them. Although, I actually don't know if it's the case that if somebody is using TikTok, they would say that they're happy with how much time they're spending on TikTok or something.
我前几天体验了Orion,我觉得它真的非常出色。总体上,我对这项技术还是持乐观态度的。正如你说的,我在这方面大体上是偏自由主义的——如果人们在做某件事,那他们大概觉得这对他们有益。不过,我也不确定,比如说一个人在刷TikTok时,他是否真会觉得自己花在上面的时间是值得的。
I'm mostly optimistic about it in the sense that if we're going to be living in this future world of AGI, we need to be upgrading our capabilities too, with tools like this. And just generally, there can be more beauty in the world if you can see Studio Ghibli everywhere or something.
我之所以乐观,是因为如果我们将生活在AGI的未来世界中,我们也需要借助这些工具来提升自身能力。而且,如果你能在任何地方都看到类似吉卜力工作室那样的美学创作,整个世界也许会因此变得更美好。
I was worried about one of the flagship use cases that your team showed me. I'm sitting at the breakfast table and on the periphery of my vision is just a bunch of Reels that are scrolling by. Maybe in the future, my AI girlfriend is on the other side of the screen or something. So I am worried that we're just removing all the friction between getting totally reward-hacked by our technology. How do we make sure this is not what ends up happening in five years?
但你们团队给我展示的一个主打应用场景让我有些担心:我坐在早餐桌边,视野边缘却不断滚动着一堆Reels(短视频)。或许未来,屏幕另一边是我虚拟的AI女友。我的担忧是:我们正在移除技术带来的所有“阻力”,让自己彻底陷入奖赏机制的“劫持”。我们该如何确保五年后不会真变成这样?
Mark Zuckerberg
Again, I think people have a good sense of what they want. That experience you saw was just a demo to show multitasking and holograms. I agree, I don't think the future is one where you have stuff that's trying to compete for your attention in the corner of your vision all the time. I don't think people would like that too much.
我仍然认为人们对自己想要的东西是有感知的。你看到的那个场景其实只是为了展示多任务处理和全息投影的能力。我同意你的看法,我不认为未来会是一个你视线角落里总有内容在争夺注意力的世界。我觉得人们不会喜欢那种状态。
As we're designing these glasses, it's actually one of the things that we're really mindful of. Probably the number one thing the glasses need to do is get out of the way and be good glasses. As an aside, I think that's part of the reason why the Ray-Ban Meta product has done so well. It's great for listening to music, taking phone calls, taking photos and videos. The AI is there when you want it. But when you don't, it's just a good-looking pair of glasses that people like. It gets out of the way well.
我们在设计这些眼镜时,其实非常注意这个问题。这些眼镜首先要做好的,是“不要妨碍你”,它们首先应该是好用的眼镜。顺带说一句,我认为这也是Ray-Ban Meta产品表现不错的原因之一:它可以很好地听音乐、接电话、拍照录像。当你需要AI时,它就在那里;当你不需要时,它就是一副好看的眼镜,戴上很舒服,不会碍事。

没什么独特的洞察力,大家都会这么做。
I would guess that's going to be a very important design principle for the augmented reality future. The main thing that I see here is this. It's kind of crazy that, for how important the digital world is in all of our lives, the only way we access it is through these physical, digital screens. You have your phone, your computer. You can put a big TV on your wall. It's this huge physical thing.
我猜这将成为未来增强现实设计中的一个非常关键的原则。我看到的核心问题是:在数字世界已经变得如此重要的今天,我们却仍然只能通过这些实体的数字屏幕接入它。你有手机、电脑,或者墙上的大电视,全都是笨重的实体设备。
It just seems like we're at the point with technology where the physical and digital world should really be fully blended. That's what holographic overlays allow you to do. But I agree. I think a big part of the design principles around that will be around how you'll be interacting with people. You'll be able to bring digital artifacts into those interactions and do cool things very seamlessly.
现在的技术发展水平,已经到了物理世界和数字世界真正可以融合的阶段了。这就是全息叠加的作用所在。不过我同意你说的,关键的设计原则之一将围绕人与人如何互动。你可以在互动中引入数字对象,做一些很酷的事情,而且完全无缝衔接。
If I want to show you something, here’s a screen. We can interact with it. It can be 3D. We can play with it. You want to play a card game? All right, here’s a deck of cards. We can play with it. If two of us are physically together and we have a third friend who’s hologramming in, they can participate too.
如果我想给你展示点什么,这里就能出现一个屏幕,我们可以一起互动。它可以是三维的,我们可以摆弄它。你想打牌?没问题,这里是一副牌,我们可以一起玩。如果我们俩在现实中同处一地,还有一个朋友以全息方式加入,他也能一起参与。
But in that world too — just as you don't want your physical space to be cluttered because it wears on you psychologically — I don't think people are going to want their digital-physical space to feel that way either. That's more of an aesthetic norm that will have to get worked out, but I think we’ll figure that out.
但即使在那样的世界里——就像我们不愿意自己的现实生活空间太杂乱,因为那会让人心理疲惫——我也不认为人们会希望他们的数字-物理融合空间变得那样杂乱。这更像是一个需要慢慢摸索出来的美学规范,但我相信我们最终会找到合适的方式。
DeepSeek & China
Dwarkesh Patel
Going back to the AI conversation, you were mentioning how big of a bottleneck the physical infrastructure can be. Related to other open-source models, like DeepSeek and so forth, DeepSeek right now has less compute than a lab like Meta and you could argue that it's competitive with the Llama models.
回到我们之前讨论的AI话题,你提到物理基础设施是个很大的瓶颈。现在再说到其他开源模型,比如DeepSeek——它目前的计算资源远不如Meta这样的实验室,但你可以说,它的性能已经能与Llama系列相媲美。
If China is better at physical infrastructure, industrial scale-ups, getting more power and more data centers online, how worried are you that they might beat us here?
如果中国在物理基础设施建设、工业扩张、更快获取电力和数据中心上线方面更有优势,你会多担心他们在这方面领先我们?
Mark Zuckerberg
It's a real competition. You're seeing industrial policies really play out. China is bringing online more power. Because of that, the US really needs to focus on streamlining the ability to build data centers and produce energy. Otherwise, I think we’ll be at a significant disadvantage.
这是一场真正的竞争。你可以看到产业政策正在发挥实际作用。中国确实在不断上线更多的电力资源。正因如此,美国确实需要更加重视优化数据中心建设流程和能源生产能力。否则,我认为我们会处于明显的劣势。
At the same time, some of the export controls on things like chips, I think you can see how they’re clearly working in a way. There was all the conversation with DeepSeek about, "Oh, they did all these very impressive low-level optimizations." And the reality is, they did and that is impressive.
同时,美国对芯片等领域的出口管制也确实在某种程度上发挥了作用。大家都在谈论DeepSeek做了很多非常厉害的底层优化。事实也是如此——他们确实做到了,而且确实令人印象深刻。
But then you ask, "Why did they have to do that, when none of the American labs did it?" It’s because they’re using partially nerfed chips that are the only ones NVIDIA is allowed to sell in China because of the export controls. DeepSeek basically had to spend a bunch of their calories and time doing low-level infrastructure optimizations that the American labs didn’t have to do.
但接着你会问,“为什么他们非得这样做,而美国的实验室却不需要?”这是因为他们使用的是受限版本的芯片——那是由于出口管制,NVIDIA在中国唯一被允许销售的芯片版本。DeepSeek不得不花大量精力和时间去做底层基础设施的优化,而美国实验室完全不需要做这些。
Now, they produced a good result on text. DeepSeek is text-only. The infrastructure is impressive. The text result is impressive. But every new major model that comes out now is multimodal. It's image, it's voice. Theirs isn't.
当然,他们在文本上确实做出了不错的成果。DeepSeek目前是纯文本的。他们的底层架构令人敬佩,文本表现也很强。但现在所有主流的大模型都是多模态的——图像、语音等。而他们的模型还不是。
Now the question is, why is that the case? I don’t think it’s because they’re not capable of doing it. It's because they had to spend their calories on doing these infrastructure optimizations to overcome the fact that there were these export controls.
问题是,为什么会这样?我并不认为是他们没能力做多模态,而是他们的资源和精力不得不花在底层架构优化上,用来克服出口管制带来的限制。
But when you compare Llama 4 with DeepSeek — I mean our reasoning model isn’t out yet, so the R1 comparison isn’t clear yet — but we’re basically in the same ballpark on all the text stuff that DeepSeek is doing but with a smaller model. So the cost-per-intelligence is lower with what we’re doing for Llama on text. On the multimodal side we’re effectively leading at and it just doesn’t exist in their models.
但如果你把Llama 4和DeepSeek对比——我意思是我们还没发布推理版本(R1),所以还不能完全对比——但就文本任务而言,我们基本和DeepSeek处于同一水平,而且用的是更小的模型。所以在文本任务上,我们在“单位智能成本”上更具优势。而在多模态方面,我们则是明显领先——他们目前还没有多模态模型。
So the Llama 4 models, when you compare them to what DeepSeek is doing, are good. I think people will generally prefer to use the Llama 4 models. But there’s this interesting contour where it’s clearly a good team doing stuff over there. And you're right to ask about the accessibility of power, the accessibility of compute and chips, because the work that you're seeing different labs do and the way it's playing out is somewhat downstream of that.
所以总体来看,把Llama 4和DeepSeek对比,我们的模型表现是不错的。我相信大家最终也更愿意用Llama 4。但与此同时,DeepSeek那边显然也是一支很强的团队,确实在做一些很有水平的事情。而你提出的问题也非常重要:谁能更容易获取能源、算力和芯片——因为不同实验室做出来的成果,其实在某种程度上都是这些“上游因素”的产物。
Open source AI
Dwarkesh Patel
So Sam Altman recently tweeted that OpenAI is going to release an open-source SOTA reasoning model. I think part of the tweet was that they won’t do anything silly, like say you can only use it if you have less than 700 million users.
最近Sam Altman发推说OpenAI将发布一个开源的SOTA推理模型。他还提到不会做那种“傻事”,比如说“你只有在用户数少于7亿的情况下才能使用它”。
DeepSeek has the MIT license, whereas I think a couple of the contingencies in the Llama license require you to say "built with Llama" on applications using it or any model that you train using Llama has to begin with the word "Llama." What do you think about the license? Should it be less onerous for developers?
DeepSeek用了MIT开源协议,而Llama的许可条款好像有一些附带要求,比如你必须在使用Llama的应用上写明“built with Llama”,或者你用Llama训练的模型名称必须以“Llama”开头。你怎么看这些条款?你认为这对开发者来说是否太苛刻了?
Mark Zuckerberg
Look, we basically pioneered the open-source LLM thing. So I don't consider the license to be onerous. When we were starting to push on open source, there was this big debate in the industry. Is this even a reasonable thing to do? Can you do something that is safe and trustworthy with open source? Will open source ever be able to be competitive enough that anyone will even care?
我们基本上是开源大语言模型这件事的开创者。所以我不觉得我们的许可条款是苛刻的。当我们刚开始推动开源时,整个行业其实在激烈争论:这是不是一件合理的事?开源模型能否做到安全、可信?开源是否有竞争力到足以被重视的程度?
Basically, when we were answering those questions a lot of the hard work was done by the teams at Meta. There were other folks in the industry but really, the Llama models were the ones that broke open this whole open-source AI thing in a huge way.
当时面对这些问题,Meta的团队做了大量艰难的工作。行业里也有其他人在努力,但真正把开源AI格局打开的,其实是Llama系列模型。
If we’re going to put all this energy into it, then at a minimum, if you're going to have these large cloud companies — like Microsoft and Amazon and Google — turn around and sell our model, then we should at least be able to have a conversation with them before they do that around what kind of business arrangement we should have.
如果我们为此投入了大量资源,那么至少,在这些大型云计算公司——比如微软、亚马逊、谷歌——打算拿我们的模型去销售之前,我们有权利和他们谈一谈,看能不能达成某种商业安排。
Our goal with the license, we're generally not trying to stop people from using the model. We just think that if you're one of those companies, or if you're Apple, just come talk to us about what you want to do. Let's find a productive way to do it together. I think that’s generally been fine.
我们制定许可协议的目标,并不是阻止人们使用模型。我们只是希望,如果你是这些大型公司,比如苹果,那就和我们聊聊你想怎么用。让我们一起找到一个双赢的方式。目前来看,这样的安排总体上运行得还不错。
Now, if the whole open-source part of the industry evolves in a direction where there are a lot of other great options and the license ends up being a reason why people don’t want to use Llama, then we’ll have to reevaluate the strategy. What it makes sense to do at that point. But I don’t think we’re there.
当然,如果开源领域未来演变出大量其他优秀选择,而我们的许可条款变成了大家不愿意使用Llama的原因,那我们就需要重新评估我们的策略——到那时再看怎么调整。但我不认为我们现在已经到了那个阶段。
That’s not, in practice, something we’ve seen, companies coming to us and saying, “We don’t want to use this because your license says if you reach 700 million people, you have to come talk to us.” So far, that’s been more something we’ve heard from open-source purists like, “Is this as clean of an open-source model as you’d like it to be?”
目前,我们并没有遇到哪个公司明确告诉我们:“我们不打算用这个模型,因为你们的协议说,达到7亿用户就得跟你们谈判。” 这类反馈大多是来自开源原教旨主义者,比如质疑“这是不是一个足够‘纯粹’的开源模型?”
That debate has existed since the beginning of open source. All the GPL license stuff versus other things, do you need to make it so that anything that touches open source has to be open source too? Or can people take it and use it in different ways? I'm sure there will continue to be debates around this.
这类争论从开源运动诞生起就存在了。比如GPL协议与其他协议的争议:是不是只要沾了开源的边,所有东西都必须开源?还是可以允许人们以不同方式使用?我相信这些讨论还会持续下去。
But if you’re spending many billions of dollars training these models, I think asking the other companies — the huge ones that are similar in size and can easily afford to have a relationship with us — to talk to us before they use it seems like a pretty reasonable thing.
但如果你在训练这些模型上花了几十亿美元,我认为,要求那些规模相当、也有能力和我们合作的大公司,在使用前和我们谈一谈,是一件非常合理的事情。
Dwarkesh Patel
If it turns out that other models are also really good. There’s a bunch of good open-source models. So that part of your mission is fulfilled, and maybe other models are better at coding.
如果最终结果是,市面上还有很多其他很优秀的开源模型——从这个角度看,你们推动开源的使命已经完成了——而且其中一些模型在编码方面可能还更强。
Is there a world where you just say, "Look, the open-source ecosystem is healthy. There’s plenty of competition. We're happy to just use some other model, whether it's for internal software engineering at Meta or deploying to our apps. We don't necessarily need to build with Llama"?
在这种情况下,你们会不会说:“现在开源生态很健康,竞争也很充分。我们完全可以使用别人的模型,比如用于Meta内部的软件工程,或者部署在我们的应用里,我们不一定非得使用Llama来构建产品”?
Mark Zuckerberg
Again, we do a lot of things. Let's take a step back. The reason why we're building our own big models is because we want to be able to build exactly what we want. None of the other models in the world are exactly what we want.
首先,我们要做的事情非常多。我们先退一步看:我们之所以要打造自己的大模型,是因为我们想构建完全符合我们需求的东西。目前世界上没有哪个模型能完全满足我们要做的事情。
If they're open source, you can take them and fine-tune them in different ways. But you still have to deal with the model architectures. And they make different size tradeoffs that affect latency and inference cost. At the scale that we operate at, that stuff really matters.
即便它们是开源的,你可以拿来做微调,但你仍然要面对它的模型架构问题。它们在模型尺寸上的取舍会直接影响到延迟和推理成本。对于我们这种规模的应用来说,这些因素真的非常关键。
We made the Llama Scout and Maverick models certain sizes for a specific reason. They fit on a host and we wanted certain latency — especially for the voice models that we’re working on — that we want to pervade everything we're doing from the glasses to all of our apps to the Meta AI app and all that stuff.
我们把Llama Scout和Maverick系列做成特定规模,是有明确原因的:它们要能部署在一台主机上,而且我们对延迟有严格要求——特别是在语音模型上。因为我们希望这些模型能无缝集成到我们所有的产品中,从智能眼镜,到各种App,再到Meta AI。
There's a level of control of your own destiny that you only get when you build the stuff yourself. That said, AI is going to be used in every single thing that every company does. When we build a big model, we also have to choose which internal use cases we're going to optimize for.
只有当你自己构建这些技术时,你才能真正掌控自己的命运。话虽如此,AI几乎会应用到每家公司的每一个业务场景。我们自己训练大模型时,也必须决定要重点优化哪些内部使用场景。
So does that mean for certain things we might say, "Okay, maybe Claude is better for building this specific development tool that this team is using”? All right, cool then use that. Great. We don’t want to fight with one hand tied behind our back. We’re doing a lot of different stuff.
所以这是不是意味着:在某些具体任务上,我们可能会说,“好吧,也许Claude更适合做这个团队要用的开发工具”?那也没问题,那你就用Claude。没必要一只手被反绑着去竞争——我们有太多事情要做了。

再差的创始人好过大多数的职业经理人,比如,Google的情况。
You also asked, would it not be important anymore because other people are doing open source? On this, I'm a little more worried. You have to ask yourself this. For anyone who shows up now and is doing open source — now that we have done it — would they still be doing open source if we weren’t doing it?
你还问了,如果别人也在做开源,我们是否可以不再重视?关于这点,我其实有些担忧。你要问问自己:现在这些人之所以加入开源,是因为我们在做。如果我们不做了,他们还会继续做开源吗?
I think there are a handful of folks who see the trend that more and more development is going toward open source, and they're like, "Oh crap, we need to be on this train or else we’re going to lose." If you have a closed-model API and increasingly a lot of developers don't want that.
我觉得有些人看到了趋势——越来越多的开发者转向开源,他们会想:“糟了,我们得跟上这趟列车,否则就落后了。”尤其是当你提供的是闭源API时,很多开发者现在其实并不喜欢那种方式。
So you’re seeing a bunch of other players start to do some work in open source. But it's unclear if it's dabbling, or fundamental for them the way that it has been for us. A good example is what's going on with Android. Android started off as the open-source thing. There's not really any open-source alternative. Over time, Android has just gotten more and more closed.
现在确实有很多其他公司开始搞开源。但不清楚他们做这件事,是“浅尝辄止”,还是像我们一样把它视作核心战略。一个很典型的例子是Android:最初它是开源的,而且没有真正的开源替代方案。但随着时间推移,它变得越来越封闭。
So if you're us, you need to worry that if we stop pushing the industry in this direction, all these other people… Maybe they’re only really doing it because they're trying to compete with us and the direction we’re pushing things. They already showed their revealed preference for what they would do if open source didn’t exist. And it wasn’t open source. We just need to be careful about relying on that continued behavior for the future of the technology that we're going to build at the company.
所以,换成是我们,就必须担心:如果我们不再把行业推向开源,那些原本只是因为跟我们竞争而加入开源的公司——他们会不会也停下来?从过去的行为偏好来看,如果开源不存在,他们的选择就不会是开源。因此,我们不能轻易依赖“别人也会一直坚持开源”的前提,来决定公司未来要构建的技术基础。
Dwarkesh Patel
Another thing I've heard you mention is that it's important that the standard gets built around American models like Llama. I wanted to understand your logic there. With certain kinds of networks, it is the case that the Apple App Store just has a big contingency around what it's built around.
我听你提到过另一个观点:围绕像Llama这样的美国模型建立标准非常重要。我想更深入地理解你这个逻辑。对于某些网络平台,比如Apple的App Store,它的生态确实有很多依附于其架构的内容。
But it doesn't seem like if you built some sort of scaffold for DeepSeek, you couldn't have easily just switched it over to Llama 4, especially since between generations. Llama 3 wasn't MoE and Llama 4 is. So things are changing between generations of models as well.
但看上去,如果你为DeepSeek搭建了一些开发框架,要切换到Llama 4似乎也不是特别难。尤其考虑到Llama每一代模型之间也在不断变化,比如Llama 3不是MoE架构,而Llama 4是。所以本来就存在这种跨代的适应。
What’s the reason for thinking things will get built out in this contingent way on a specific standard?
那么,为什么你认为未来的发展会以某一个特定标准为依托,而不是更普遍地支持各种LLM模型?
Mark Zuckerberg
I'm not sure, what do you mean by contingent?
我不太确定,你说的“contingent”具体是什么意思?
Dwarkesh Patel
As in, it's important that people are building for Llama rather than for LLMs in general, because that will determine what the standard is in the future.
就是说,你认为关键在于人们是为Llama而构建,而不是为一般的LLM模型构建,因为这将决定未来的标准是什么。
Mark Zuckerberg
Look, I think these models encode values and ways of thinking about the world.
我认为这些模型会内化某种价值观以及对世界的理解方式。
We had this interesting experience early on, where we took an early version of Llama and translated it. I think it was French, or some other language.
我们早期有一个很有意思的经历,当时我们把一个早期版本的Llama翻译成法语,或者是其他某种语言。
The feedback we got from French people was, "This sounds like an American who learned to speak French. It doesn’t sound like a French person." And we were like, “what do you mean, does it not speak French well?” No, it speaks French fine. It was just that the way it thought about the world seemed slightly American. So I think there are these subtle things that get built into the models.
法国用户给我们的反馈是:“这听起来像是一个学会说法语的美国人,而不是一个法国人。” 我们当时很困惑,“你是说它法语说得不好吗?” 他们说不是,语言表达没问题,只是它对世界的看法带有一点美国人的味道。所以我认为,这类微妙的东西会被嵌入到模型中。
Over time, as models get more sophisticated, they should be able to embody different value sets across the world. So maybe that's not a particularly sophisticated example, but I think it illustrates the point.
随着模型不断变得复杂,它们理应能体现出不同文化背景的价值体系。也许这个例子不算太高级,但我认为它说明了这个问题的存在。
Some of the stuff we've seen in testing some of the models, especially coming out of China, have certain values encoded in them. And it’s not just a light fine-tune to change that. Now, language models — or something that has a kind of world model embedded in it — have more values.
我们在测试一些模型时,尤其是中国推出的那些,能看到它们编码了某些特定的价值观。而这并不是通过轻微微调就能改变的。现在的语言模型,或者说嵌入世界模型的东西,会包含更多的价值取向。
Reasoning, I guess, you could say has values too. But one of the nice things about reasoning models is they're trained on verifiable problems. Do you need to be worried about cultural bias if your model is doing math? Probably not. I think the chance that some reasoning model built elsewhere is going to incept you by solving a math problem in a devious way seems low.
推理模型也可以说带有价值观。但推理模型的一个好处是它们主要是在可验证的问题上训练的。如果模型在解数学题,你需要担心文化偏见吗?大概不太需要。我觉得一个推理模型会通过用某种诡异的方式解题来对你“潜意识植入”这种可能性非常低。
But there's a whole different set of issues around coding, which is the other verifiable domain. You need to worry about waking up one day and if you're using a model that has some tie to another government, can it embed vulnerabilities in code that their intelligence organizations could exploit later? In some future version you're using a model that came from another country and it's securing your systems. Then you wake up and everything is just vulnerable in a way that that country knows about and you don’t. Or it turns on a vulnerability at some point.
但在编程这个也是可验证的领域,会有另一类风险。如果你用的模型和某个外国政府有联系,它有没有可能在你毫无察觉的情况下在代码中植入某些漏洞,然后该国的情报部门能加以利用?未来你用这个国家开发的模型来保护你的系统,结果某天发现系统的脆弱性早就被他们掌握了,而你却一无所知,或者某个时候这个模型激活了某种漏洞。这类担忧是实际存在的。
Those are real issues. I'm very interested in studying this because I think one of the main things that's interesting about open source is the ability to distill models. For most people, the primary value isn't just taking a model off the shelf and saying, "Okay, Meta built this version of Llama. I'm going to take it and I'm going to run it exactly in my application."
这些是真实存在的问题。我对此非常感兴趣,因为开源的一个关键点就是可以做模型蒸馏。对大多数人来说,价值不是简单地把模型拿下来,然后说:“好吧,这是Meta出的Llama版本,我就照搬运行在我的应用里。”
No, your application isn't doing anything different if you're just running our thing. You're at least going to fine-tune it, or try to distill it into a different model. When we get to stuff like the Behemoth model, the whole value is being able to take this very high amount of intelligence and distill it down into a smaller model that you're actually going to want to run.
不,这样你应用的差异化就不存在了。你至少会做一些微调,或者尝试把它蒸馏成另一个模型。当我们谈到Behemoth这种巨型模型时,其真正的价值就是你能把这种极高的智能压缩成一个你实际愿意部署的小模型。
This is the beauty of distillation. It's one of the things that I think has really emerged as a very powerful technique over the last year, since the last time we sat down. I think it’s worked better than most people would have predicted. You can basically take a model that's much bigger, and capture probably 90 or 95% of its intelligence, and run it in something that's 10% of the size. Now, do you get 100% of the intelligence? No. But 95% of the intelligence at 10% of the cost is pretty good for a lot of things.
这就是蒸馏的魅力。我认为这是过去一年里快速崛起的一个非常强大的技术,比很多人原本预期的效果都要好。你可以从一个大模型中提取90%到95%的智能,运行在一个体积只有原模型10%的小模型上。当然你得不到100%的能力,但以10%的成本获取95%的智能,对大多数应用来说已经非常不错了。
The other thing that's interesting is that now, with this more varied open-source community, it's not just Llama. You have other models too. You have the ability to distill from multiple sources. So now you can basically say, "Okay, Llama’s really good at this. Maybe its architecture is really good because it's fundamentally multimodal, more inference-friendly, more efficient. But let’s say this other model is better at coding." Okay, great. You can distill from both of them and build something that's better than either individually, for your own use case. That's cool.
另一个很有趣的点是,现在的开源社区更加多元化,不只是Llama,你还有其他模型可选。你可以从多个来源进行蒸馏。比如你可以说:“Llama在某方面很强,它的架构很棒,是多模态的,推理效率高。但这个模型在编程上更优秀。”很好,你可以从两个模型中提取精华,构建出一个在你的场景中比两者单独表现都更好的模型。这就很酷。
But you do need to solve the security problem of knowing that you can distill it in a way that's safe and secure. This is something that we've been researching and have put a lot of time into. What we've basically found is that anything that's language is quite fraught. There's just a lot of values embedded into it. Unless you don't care about taking on the values from whatever model you're distilling from, you probably don't want to just distill a straight language world model.
但你必须解决一个前提问题:如何在保证安全的前提下进行蒸馏。这是我们重点研究的方向,也花了大量时间去做的。我们的发现是,凡是涉及语言的内容往往问题较大,因为其中嵌入了很多价值观。如果你不介意承接源模型的价值观,那另说。但一般而言,你不会想要直接蒸馏一个语言世界模型。
On reasoning, though, you can get a lot of the way there by limiting it to verifiable domains, and running code cleanliness and security filters. Whether it's using Llama Guard open source, or the Code Shield open source tools that we've done, things that allow you to incorporate different input into your models and make sure that both the input and the output are secure.
但对于推理任务,你可以通过限定在可验证领域内进行,并搭配代码洁净性和安全性过滤器,就能推进得很远。比如使用我们开源的Llama Guard或者Code Shield这些工具,确保你的模型在接收和输出的内容上都是安全的。
Then it’s just a lot of red teaming. It’s having experts who are looking at the model and asking, "Alright, is this model doing anything after distillation that we don't want?" I think with the combination of those techniques, you can probably distill on the reasoning side for verifiable domains quite securely. That's something I'm pretty confident about and something we've done a lot of research around.
之后就进入红队测试阶段,让专家们对模型进行严格审查,看看在蒸馏后模型是否出现了我们不希望它有的行为。我认为,结合上述技术,你是可以在可验证的推理领域中实现安全蒸馏的。我对此相当有信心,我们也投入了很多研究。
But I think this is a very big question. How do you do good distillation? Because there’s so much value to be unlocked. But at the same time, I do think there is some fundamental bias embedded in different models.
但我认为这是个非常大的问题:如何做好蒸馏?因为其中蕴含的潜在价值巨大。但与此同时,不同模型中确实也有某些根本性的偏见被嵌入其中,这是必须面对的现实。
Monetizing AGI
AGI的货币化
Dwarkesh Patel
Speaking of value to be unlocked, what do you think the right way to monetize AI will be? Obviously digital ads are quite lucrative. But as a fraction of total GDP, it's small compared to all remote work. Even if you can increase productivity without replacing work, that's still worth tens of trillions of dollars. Is it possible that ads might not be it? How do you think about this?
谈到可释放的价值,你认为变现人工智能的正确方式是什么?显然数字广告是非常赚钱的。但在GDP总量中所占比例,它仍远小于远程工作的规模。即便只是提高工作效率而不替代工作,本身就价值数十万亿美元。有没有可能广告并不是最终答案?你是怎么考虑这个问题的?
Mark Zuckerberg
Like we were talking about before, there's going to be all these different applications, and different applications tend toward different things.
就像我们之前讨论的,会有各种不同的应用场景,而不同的应用会自然适配不同的商业模式。
Ads are great when you want to offer people a free service. Because it's free, you need to cover it somehow. Ads solve this problem where a person does not need to pay for something. They can get something that is amazing for free. Also by the way, with modern ad systems, a lot of the time people think the ads add value to the thing if you do it well.
广告非常适合用来支持免费服务。因为服务是免费的,你总得有办法去支付成本。广告很好地解决了这个问题——用户不需要付钱也能获得很棒的服务。而且现在的广告系统做得很先进,如果做得好,用户甚至会觉得广告本身是有价值的补充。
You need to be good at ranking and you need to have enough liquidity of advertising inventory. If you only have five advertisers in the system, no matter how good you are at ranking, you may not be able to show something to someone that they're interested in. But if you have a million advertisers in the system, then you're probably going to be able to find something pretty compelling, if you're good at picking out the different needles in the haystack that that person is going to be interested in.
你得有很好的广告排序能力,也要有足够多的广告主。如果系统里只有五个广告主,不管排序做得多好,都可能无法为用户匹配到他们感兴趣的内容。但如果系统里有上百万个广告主,再加上你能精准地从海量信息中找出那个“针”,就很有可能展示出让人感兴趣的广告。

搜索引擎的数据库仍然是有价值的。
So that definitely has its place. But there are also clearly going to be other business models as well, including ones that just have higher costs so it doesn't even make sense to offer them for free. By the way, there have always been business models like this.
所以广告显然有它的用武之地。但也肯定会出现其他商业模式,尤其是那些成本较高、根本无法以免费形式提供的服务。其实一直以来就存在这样的商业模式。
There's a reason why social media is free and ad-supported, but then if you want to watch Netflix or ESPN or something, you need to pay for that. The content that's going into that, they need to produce it, and that's very expensive for them to produce. They probably could not have enough ads in the service in order to make up for the cost of producing the content. Basically, you just need to pay to access it.
比如,社交媒体之所以免费并依靠广告,是因为它的内容生产模式不同。但如果你想看Netflix或ESPN,就必须付费,因为这些内容需要制作,而制作成本非常高。他们可能根本无法通过广告覆盖掉这些制作成本。所以你就得为这些内容付费。
The trade-off is fewer people do it. Instead of billions, you're talking about hundreds of millions of people using those services. There's a value switch there. I think it's similar here. Not everyone is going to want a software engineer, or a thousand software engineering agents, or whatever it is. But if you do, that's something you're probably going to be willing to pay thousands, or tens of thousands, or hundreds of thousands of dollars for.
这种模式的权衡是:使用人数会少很多。不是十亿人用,而是几亿人用。这其实是价值层级的切换。我认为AI也类似。不是每个人都需要一个或上千个软件工程师代理人。但如果你确实需要,那你很可能愿意为此支付几千、几万、甚至几十万美元。
That just speaks to the diversity of different things that need to get created. There are going to be business models at each point along the spectrum. At Meta, for the consumer piece we definitely want to have a free thing. I'm sure that will end up being ad-supported. But I also think we're going to want to have a business model that supports people using arbitrary amounts of compute to do even more amazing things than what it would make sense to offer in the free service. For that, I'm sure we'll end up having a premium service. But I think our basic values on this are that we want to serve as many people in the world as possible.
这恰恰说明了不同场景背后的多样化需求。不同层级都会有适配的商业模式。在Meta,我们希望面向消费者的服务可以是免费的,我相信最终它会通过广告来支持。但我们也需要有一种商业模式,来支持用户使用大量算力去做一些更惊艳的事情——那些不是免费服务能覆盖的部分。为此,我们肯定会推出高级付费服务。但我们的核心理念是,我们希望服务尽可能多的人。
The role of a CEO
首席执行官的角色
Dwarkesh Patel
How do you keep track of all these different projects, some of which we've talked about today. I'm sure there are many I don't even know about. As the CEO overseeing everything, there's a big spectrum between going to the Llama team and saying, "Here are the hyperparameters you should use," versus just giving a mandate like, "Go make the AI better."
你如何跟踪所有这些不同的项目?我们今天谈到了一些,但我确定还有很多我根本不知道的。作为一名负责整个公司的CEO,你介入的程度可以从像“去告诉Llama团队该使用哪些超参数”这样非常具体的指导,到“把AI做得更好”这样宽泛的指令,范围非常广。
And there are so many different projects. How do you think about the way in which you can best deliver your value-add and oversee all these things?
项目如此之多,你如何思考怎样才能最好地发挥你的价值并进行有效监督?
Mark Zuckerberg
A lot of what I spend my time on is trying to get awesome people onto the teams. There's that, and then there's stuff that cuts across teams. You build Meta AI, and you want to get it into WhatsApp or Instagram. Okay, now I need to get those teams to talk together. Then there are a bunch of questions like, “do you want the thread for Meta AI in WhatsApp to feel like other WhatsApp threads, or do you want it to feel like other AI chat experiences?” There are different idioms for those. So there are all these interesting questions that need to get answered around how does this stuff basically fit into everything we're doing?
我很多时间花在招募优秀的人才加入团队上。除此之外,还有很多横跨团队的事务。你开发了Meta AI,现在你想把它整合进WhatsApp或Instagram。那么我就需要让这些团队相互沟通。接着就会出现一系列问题,比如:“你希望Meta AI在WhatsApp里的对话界面感觉像是普通的WhatsApp对话,还是更像是AI聊天体验?”这背后有不同的设计语言。因此,有很多类似这样的问题需要我们思考——如何让这些东西融入我们正在做的一切。
Then there's a whole other part of what we're doing, which is pushing on the infrastructure. If you want to stand up a gigawatt cluster, first of all, that has a lot of implications for the way we're doing infrastructure buildouts. It has political implications for how you engage with the different states where you're building that stuff. It has financial implications for the company in terms of: "All right, there's a lot of economic uncertainty in the world. Do we double down on infrastructure right now? If so, what other trade-offs do we want to make around the company?" Those are the kinds of decisions that are tough for other people to really make.
另外还有一整块工作是基础设施的推动。如果你想建一个吉瓦级的集群,那对我们建设基础设施的方式有重大影响,也有政治层面的影响——比如你要如何与那些你打算建数据中心的州进行互动。这也有财务上的影响——“现在全球经济充满不确定性,我们是否还要在基础设施上加倍投入?如果是,那我们在公司其他方面需要做出哪些取舍?”这些决策往往是其他人难以独立做出的。
Then there's this question around taste and quality. When is something good enough that we want to ship it? In general, I'm the steward of that for the company. Although we have a lot of other people who I think have good taste as well and are also filters for different things.
还有一个层面是关于“品位”和“质量”的判断。什么东西算是好到可以发布了?一般来说,我是公司在这方面的主要把关人。虽然我们也有很多我认为很有眼光的人,他们也在各自领域担任质量过滤器的角色。
Those are basically the areas. AI is interesting because, more than some of the other stuff that we do, it is more research and model-led than really product-led. You can't just design the product that you want and then try to build the model to fit into it. You really need to design the model first and the capabilities that you want, and then you get some emergent properties. Then it's, "Oh, you can build some different stuff because this turned out in a certain way." At the end of the day, people want to use the best model.
基本上,这就是我介入的几个主要领域。AI很特别,因为它比我们做的很多其他事情更偏“研究驱动”而非“产品驱动”。你不能先把想要的产品设计出来,再去构建一个模型去适配它。你必须先设计出你想要的模型和能力,然后你会发现它会有一些“涌现性特征”,接着你才会意识到:“哦,因为它变成了这样,所以我们可以做一些原本没想到的东西。”归根结底,用户想要的是最好的模型。
That's partially why, when we're talking about building the most personal AI, the best voice, the best personalization — and also a very smart experience with very low latency — those are the things that we need to design the whole system to build. That's why we're working on full-duplex voice. That's why we're working on personalization to both have good memory extraction from your interactions with AI, but also to be able to plug into all the other Meta systems. That's why we design the specific models that we design, to have the kind of size and latency parameters that they do.
这也是为什么当我们在讨论要构建“最具个性化的AI”、“最自然的语音”、“最聪明的低延迟体验”时,我们需要从系统层级去设计、去打造。这就是我们为什么在做“全双工语音”,为什么在做个性化系统——不仅要从你与AI的交互中提取有效记忆,还要能与Meta的所有其他系统无缝衔接。这也是我们之所以设计特定架构和尺寸的模型的原因——它们的体量和延迟参数都是为特定目标定制的。
Is big tech aligning with Trump?
大科技公司是否与特朗普结盟?
Dwarkesh Patel
Speaking of politics, there's been this perception that some tech leaders have been aligning with Trump. You and others donated to his inaugural event and were on stage with him and I think you settled a lawsuit that resulted in them getting $25 million.
说到政治,外界一直有种看法,认为一些科技领袖正在与特朗普靠拢。你和其他人曾向他的就职典礼捐款,也曾和他同台。我记得你们还通过一项和解协议支付了2500万美元。
I wonder what's going on here? Does it feel like the cost of doing business with an administration? What's the best way to think about this?
我很好奇这是怎么回事?这是不是和政府打交道的“成本”?我们应该如何看待这件事?
Mark Zuckerberg
My view on this is that he's the President of the United States. Our default, as an American company, should be to try to have a productive relationship with whoever is running the government. We've tried to offer support to previous administrations as well. I've been pretty public with some of my frustrations with the previous administration, how they basically did not engage with us or the business community more broadly.
我的看法是:他是美国总统。作为一家美国公司,我们的默认立场应该是尽量与掌权的政府建立建设性关系。我们也曾尝试支持过前几届政府。我也曾公开表达过对上届政府的一些不满,他们基本上没有和我们,或者更广泛的商业界接触。
Frankly, that’s going to be necessary to make progress on some of these things. We're not going to be able to build the level of energy that we need if you don't have a dialogue, and if they're not prioritizing trying to do those things.
坦白说,要在某些事情上取得进展,这种对话是必须的。如果没有交流、没有政府的优先推动,我们就无法建设出我们想要的那种基础能源和产业能力。
A lot of people want to write this story about what direction people are going. We're trying to build great stuff, and we want to have a productive relationship with people. That's how I see it. It is also how I would guess most others see it, but obviously, I can't speak for them.
很多人喜欢渲染“科技界朝哪一方靠拢”的故事。而我们是在专注于打造伟大的产品,并希望与各方建立建设性的关系。我是这样看的,我猜多数人也这么看,当然我不能代表他们发言。
Dwarkesh Patel
You've spoken out about how you've rethought some of the ways in which you engage and defer to the government, in terms of moderation stuff in the past.
你也曾谈到自己在内容审核方面,反思过与政府互动及让步的方式。
How are you thinking about AI governance? Because if AI is as powerful as we think it might be, the government will want to get involved. What is the most productive approach to take there, and what should the government be thinking about?
那你现在如何看待AI治理?如果AI真的如我们预期般强大,政府必然会介入。那么我们应该如何采取最有建设性的策略?政府又该如何思考?
Mark Zuckerberg
I guess in the past, most of the comments that I made were in the context of content moderation. It's been an interesting journey over the last 10 years on this. It's obviously been an interesting time in history. There have been novel questions raised about online content moderation.
我过去的大多数发言,主要是围绕内容审核的。在过去10年,这确实是一段颇具挑战性的历程,也是一个特殊的历史阶段。关于在线内容审核出现了许多前所未有的问题。
Some of those have led to productive new systems getting built, like our AI systems to detect nation-states trying to interfere in each other's elections. I think we will continue building that stuff out, and that has been net positive.
其中一些问题促成了新的有益系统的建立,比如我们现在用AI去侦测国家行为体干预选举的企图。我认为我们还会继续推进这方面的建设,这总体是正面的成果。
With some other stuff, we went down some bad paths. I just think the fact-checking thing was not as effective as Community Notes because it's not an internet-scale solution. There weren't enough fact-checkers, and people didn't trust the specific fact-checkers. You want a more robust system. So I think what we got with Community Notes is the right one on that.
而有些事情,我们的确走了一些弯路。比如“事实核查”系统,我认为它不如“Community Notes”那样有效,因为它不是一个能扩展到互联网规模的解决方案。事实核查员数量不足,公众也不信任某些核查来源。我们需要的是一个更强健的机制,而“Community Notes”就是一个正确的方向。
But my point on this was more that historically, I probably deferred a little too much to either the media and their critiques, or to the government, on things that they did not really have authority over. But just as like a central figure, I think we tried to build systems where maybe we wouldn't have to make all of the content moderation decisions ourselves or something.
但我真正的观点是:从历史看,我过去可能过于听信媒体的批评,或在政府并无明确权限的事情上对其让步。作为一个公司的中心人物,我过去尝试搭建一些机制,让我们自己不需要事无巨细地做出所有内容审核决定。
I guess part of the growth process over the last 10 years is realizing, “Okay, we're a meaningful company. We need to own the decisions that we need to make. We should listen to feedback from people, but we shouldn't defer too much to people who do not actually have authority over this. Because at the end of the day, we're in the seat, and we need to own the decisions that we make.”
过去10年的成长过程中,我逐渐意识到:“好吧,我们是一家有影响力的公司,我们必须对该由我们做出的决定负责。我们应当听取各方意见,但不能一味让步于那些实际上没有决策权的人。因为最终责任在我们,我们必须承担并主导这些决策。”
It's been a maturation process, and in some ways painful, but I think we're probably a better company for it.
这是一个成熟的过程,有时也挺痛苦的。但我认为正因为这样,我们才成为了一家更成熟的公司。
Dwarkesh Patel
Will tariffs increase the cost of building data centers in the US and shift buildouts to Europe and Asia?
关税是否会提高在美国建设数据中心的成本,并将建设转移到欧洲和亚洲?
Mark Zuckerberg
It is really hard to know how that plays out. I think we're probably in the early innings on that, and it's very hard to know.
现在很难判断这件事将如何发展。我认为我们大概还处于初期阶段,目前确实难以预料。
Dwarkesh Patel
What is your single highest-leverage hour in a week? What are you doing in that hour?
你每周中最有杠杆效应的一小时是什么?那一小时你通常在做什么?
Mark Zuckerberg
I don't know. Every week is a little bit different. It's probably got to be the case that the most leveraged thing you do in a week is not the same thing each week. Or else, by definition, you should probably spend more than one hour doing that thing every week.
我不太确定。每周都略有不同。很可能每周最具杠杆效应的事情都不是同一件。如果是,那按道理你应该每周花不止一小时在那件事情上。
I don't know. Part of the fun of this job, and also of the industry being so dynamic, is that things really move around. The world is very different now than it was at the beginning of the year, or even six months ago, or in the middle of last year. I think a lot has advanced meaningfully. A lot of cards have been turned over since the last time that we sat down. I think that was about a year ago, right?
我也不确定。这份工作的一大乐趣在于,它本身和我们所在行业的动态变化使得每周的重点都可能不同。现在的世界和年初、甚至六个月前或去年中期都已经不一样了。很多方面都取得了实质性的进展,自从我们上次对话以来,已经翻开了很多新的一页。我记得那大概是一年前吧?

科技企业的不确定。
Dwarkesh Patel
Yeah. I guess what you were saying earlier that recruiting people is a super high-leverage thing you do.
对。我记得你之前说过,招聘是你做过的最具杠杆效应的事情之一。
Mark Zuckerberg
It's very high-leverage, yeah.
是的,招聘确实是非常高杠杆的事情。
100x productivity
100 倍的生产率
Dwarkesh Patel
You talked about these models being mid-level software engineers by the end of the year. What would be possible if, say, software productivity increased like 100x in two years? What kinds of things could be built that can't be built right now?
你提到这些模型到今年年底可能会达到中级软件工程师的水平。如果假设未来两年内软件生产力提升了100倍,会出现哪些现在无法实现的项目或成果?
Mark Zuckerberg
What kinds of things? That's an interesting question. One theme of this conversation is that the amount of creativity that's going to be unlocked is going to be massive.
会出现什么样的东西?这是个有趣的问题。本次对话的一个主题就是,将释放出前所未有的创造力。
If you look at the overall arc of human society and the economy over 100 or 150 years, it's basically people going from being primarily agrarian — with most human energy going toward just feeding ourselves — to that becoming a smaller and smaller percent. And the things that take care of our basic physical needs have become a smaller and smaller percent of human energy.
如果你回顾过去100到150年人类社会和经济的发展轨迹,会发现人类从主要以农业为主、将大部分精力用于维持温饱,转变为对基本物质需求的投入占比越来越小。
That shift has led to two impacts: one is that more people are doing creative and cultural pursuits. The second is that more people, in general, spend less time working and more time on entertainment and culture. I think that is almost certainly going to continue as this goes on.
这种转变带来了两个影响:其一是越来越多的人从事创造性和文化类的工作;其二是人们总体上花更少时间工作,更多时间用于娱乐和文化活动。我认为这种趋势几乎一定会继续下去。
This isn't the 1-2 year thing of what happens when you have a super powerful software engineer. But over time, if everyone has these superhuman tools to create a ton of different stuff, you're going to get incredible diversity. Part of it is going to be solving hard problems: solving diseases, advancing science, developing new technology that makes our lives better.
这不是说未来一两年内拥有超级软件工程师就会发生的事。但从长期看,如果每个人都拥有这种“超能力”工具,创造各种事物,那会带来惊人的多样性。其中一部分将是解决难题,比如攻克疾病、推进科学、开发改善生活的新技术。
But I would guess that a lot of it is going to end up being cultural and social pursuits and entertainment. I would guess the world is going to get a lot funnier, weirder, and quirkier, the way that memes on the internet have gotten over the last 10 years. I think that adds a certain richness and depth. In funny ways, it actually helps you connect better with people. Now all day long, I just find interesting stuff on the internet and send it in group chats to the people I care about, who I think are going to find it funny.
但我猜其中很大一部分成果会落在文化、社交和娱乐领域。我猜未来的世界会变得更有趣、更奇怪、更古怪,就像过去十年互联网里的表情包和迷因发展一样。这种现象其实增添了某种丰富性和深度。以一种有趣的方式,它实际上帮你更好地与他人建立联系。现在我整天在网上看到有趣的东西,然后发到群聊里给我在乎的、我觉得也会觉得好笑的人。
The media that people can produce today to express very nuanced, specific cultural ideas is really cool. That'll continue to get built out. It does advance society in a bunch of ways, even if it's not the "hard science" way of curing a disease.
当下人们能用媒体表达极其细腻、具体的文化思想,这真的很酷。这种能力会持续发展。即便它不是“硬科学”的方式、不是用来治病救人的,它仍然在多方面推动社会进步。
If you think about it, the Meta social media view of the world is that yeah, people are going to spend a lot more time doing that stuff in the future. It's going to be a lot better, and it's going to help you connect, because it'll help express different ideas.
如果你想想,Meta对世界的社交媒体观是这样的:未来人们会花更多时间从事这类事情,这些活动也会变得更好,并帮助你建立连接,因为它们能更好地表达各种思想。
The world is going to get more complicated, but our technology, our cultural technology, to express these very complicated things — in a very kind of funny little clip or whatever — is going to get so much better. I think that's all great.
世界会变得更复杂,但我们用于表达这些复杂内容的技术——特别是文化类技术,比如用一段幽默的小视频来表达——也将越来越强大。我觉得这太棒了。
I don't know about next year. One other thought that I think is interesting to cover is that I tend to think that, for at least the foreseeable future, this is going to lead to more demand for people doing work, not less. Now, people have a choice of how much time they want to spend working.
我不确定明年会如何。还有一个我觉得有趣的观点是:至少在可预见的未来,这种技术变革会带来“对劳动的更大需求”,而不是减少需求。当然,未来人们可以自主决定自己想投入多少时间去工作。
I'll give you one interesting example we were talking about recently. We have almost three and a half billion people using our services every day. One question we've struggled with forever is how do we provide customer support?
我来讲一个我们最近讨论的有趣例子。我们每天大约有35亿人使用我们的服务。我们长期以来一直在思考的一个问题是:我们该如何提供客户支持?
Today, you can write an email, but we've never seriously been able to contemplate having voice support where someone can just call in. I guess that's maybe one of the artifacts of having a free service. The revenue per person isn't high enough to have an economic model where people can call in.
现在用户可以发邮件,但我们从来没有认真考虑过提供电话语音支持,让用户直接打电话进来。我想这也许是“免费服务”所带来的一个结构性结果。每个用户带来的收入不够高,无法支撑一个让人们打电话求助的经济模型。
But also, with three and a half billion people using your service every day, the number of calls would be massive. It’d be like the biggest call center in the world. It would be like $10 or $20 billion a year to staff that. So we've never thought too seriously about it, because it always seemed like there was no way that could make sense. But now, as AI gets better, you're going to get to a place where AI can handle a bunch of people's issues.
而且,考虑到每天有35亿人在使用我们的服务,若开放电话支持,呼叫量将极其庞大。那就像世界上最大规模的呼叫中心,每年可能需要投入100亿到200亿美元来雇佣人员。因此我们过去从未认真考虑过这件事,因为它看起来根本不现实。但现在,随着AI的进步,AI将有能力处理相当一部分用户问题。
Not all of them — maybe 10 years from now it can handle all of them — but thinking about a 3-5 year time horizon, it will be able to handle a bunch. It's kind of like a self-driving car. They can handle a bunch of terrain, but they're not doing the whole route by themselves yet in most cases. People thought truck-driving jobs were going to go away, but there's actually more truck-driving jobs now than when we first started talking about self-driving cars 20 years ago.
并不是全部问题——也许10年后可以处理所有问题——但在3到5年的时间范围内,它能够处理相当大一部分。就像自动驾驶汽车一样,它们可以应对很多路况,但在大多数情况下仍然无法全程自主驾驶。人们曾认为卡车司机的工作会消失,但事实上,现在的卡车司机比我们20年前首次讨论自动驾驶时还要多。
Going back to the customer support thing, it wouldn't make sense to staff out calling for everyone. But let's say AI can handle 90% of that. Then if it can't, it kicks it off to a person. If you get the cost of providing that service down to one-tenth of what it would've otherwise been, then maybe now it actually makes sense to do it. That would be cool. So the net result is that I actually think we're probably going to hire more customer support people.
回到客户支持的问题,如果每个人都提供人工电话支持显然不现实。但如果AI可以处理其中的90%,剩下的10%再转给人类客服处理,那就可能行得通。如果我们能将整体服务成本降至原来的十分之一,那它就突然变得具有可行性了。那将非常酷。因此,从最终效果来看,我反而认为我们可能会雇佣更多客户支持人员。
The common belief is that AI will automate jobs away. But that hasn't really been how the history of technology has worked. Usually, you create things that take away 90% of the work, and that leads you to want more people, not less.
人们普遍认为AI会取代人类工作。但科技发展的历史其实并非如此。通常情况下,你会创造出能替代90%工作的工具,然后你会需要更多人,而不是更少。
Dwarkesh Patel
Final question: Who is the one person in the world today who you most seek out for advice?
最后一个问题:现在这个世界上,你最常寻求建议的是谁?
Mark Zuckerberg
Oh, man. I feel like part of my style is that I like having a breadth of advisors. It's not just one person.
哦,这个问题啊。我的风格是喜欢有广泛的顾问群体,而不是只依赖某一个人。
We've got a great team. There are people at the company, people on our board. There are a lot of people in the industry who are doing new stuff. There's not a single person. But it's fun. Also, when the world is dynamic, just having a reason to work with people you like on cool stuff… To me, that's what life is about.
我们有一个很棒的团队,包括公司内部的人、董事会成员,还有业界一些在做创新事情的人。我没有唯一固定请教的对象。但这其实也挺有趣的。在一个充满变化的世界里,能有机会和你喜欢的人一起做酷炫的事情……对我来说,这就是生活的意义。
Dwarkesh Patel
Great note to close on. Thanks for doing this.
很好的结尾,谢谢你接受这次采访。
Mark Zuckerberg
Yeah, thank you.
不客气,谢谢你。