2024-05-23 Ben Thompson.Interview with Microsoft CTO Kevin Scott About Building Platforms on AI

2024-05-23 Ben Thompson.Interview with Microsoft CTO Kevin Scott About Building Platforms on AI

AI Platform

AI 平台

Kevin Scott, welcome back to Stratechery.
Kevin Scott,欢迎回到 Stratechery。

KS: Thank you for having me.
KS:谢谢邀请我。

So let’s rewind 10 years or so and walk me through your thought process that led the journey that Microsoft has been on from AI, from high performance compute or AI Compute, from the partnership with OpenAI. Was there a specific point that made you realize that this is the path you needed to go on?
我们把时间倒回大约 10 年,请你讲讲当时的思考过程:它是如何引导 Microsoft 走上如今这条 AI 之路的?包括高性能计算(High Performance Compute)或 AI Compute,以及与 OpenAI 的合作。有没有某个明确的节点让你意识到,这就是你们必须走的路径?

KS: Yeah, certainly. The interesting thing is I’ve only been at Microsoft for a little over seven and a half years now, but I do think 10 years ago, I was still at LinkedIn running the engineering and operations team there, and it was already just super obvious that we were on a very interesting new curve with AI. It wasn’t the flavor of generative AI that we have right now, but the things that people were doing with really complicated statistical machine learning and how much benefit we were getting already 10 years ago from the scale up of these systems was just faster than I expected.
KS:是的,当然。有趣的是,我在 Microsoft 才工作了七年半多一些,但我确实认为在 10 年前——那时我还在 LinkedIn 负责工程和运营团队——已经非常明显我们正处在 AI 的一个非常有趣的新发展曲线上。那时还不是如今这种生成式 AI 的风格,但人们用复杂的统计式机器学习在做的事情,以及我们在 10 年前就已经从这些系统的规模化中获得的收益,都比我预期来得更快。

I’ve been doing this for a relatively long time, so I built a bunch of machine learning systems at Google right around the time that Google IPO’d, including working on the big machine learning systems that ran the ads auction at the time. It was already obvious back then that scale mattered an awful lot. But the thing that was a relatively new update, call it six years ago, is that this scale-up was leading to AI models that started to behave like platforms.
我做这件事已经很久了。早在 Google 上市前后,我就构建了很多机器学习系统,包括当时驱动广告竞价的大型机器学习系统。那时候就已经很清楚,规模极其重要。但在大约六年前出现的一个相对新的变化是:这种规模化开始让 AI 模型变得像平台一样。

So instead of having a model that was purpose-built for one particular thing, and then you applied a lot of scale and that one particular thing, like CTR (Click-Through Rate) prediction for advertisements got really good, we began to see the scale up properties in these large language models lead to the large language models being reusable for lots and lots and lots of different things.
因此,不再是为某个特定任务定制一个模型,然后通过扩大规模把那个单一任务(比如广告的 CTR(Click-Through Rate,点击率)预测)做得非常好;我们开始看到,这些大型语言模型在规模化之后,会变得可复用,能够用于非常非常多的不同事情。

Well, that’s actually a question I want to get to because you and Satya, you keep talking about this platform shift, platform shift, and the word “platform” keeps coming up.
嗯,这正是我想追问的,因为你和 Satya 一直在谈所谓的“平台转移、平台转移”,而“platform”这个词不断出现。

KS: Yep.
KS:对。

I was going to ask you what you meant by that, but what I’m hearing from your answer is by platform, you mean the fact that it’s generalizable.
我本来想问你所指的平台是什么意思,但从你的回答里,我听到的是:所谓平台,指的是它具有可泛化性(generalizable)。

KS: Correct. And that it is a component that composes with software systems that you’re building in a very flexible, general way. So rather than having this world of AI where a company like Microsoft might have a hundred different teams who, top to bottom, had to be responsible for, “This is my data”, and, “This is my machine learning algorithm”, and, “This is how I’m going to train the machine learning system on this data and this is how I’m going to go deploy it”, and, “This is how I’m going to get feedback from the deployment process and real usage into the model to improve everything over time”, and you’ve got a hundred small flywheels turning, instead you’re able to invest in a central model training effort, and the thing that you get out of it is very widely useful across a huge range of applications that you already have and it opens up the possibility to build a bunch of new things that were impossible before.
KS:没错。并且它还是一个可以与正在构建的软件系统以非常灵活、通用方式组合的组件。也就是说,不再是这样一种 AI 世界:像 Microsoft 这样的公司可能会有上百个不同的团队,各自从上到下负责“这是我的数据”、“这是我的机器学习算法”、“我会这样在这些数据上训练机器学习系统并这样去部署”、“我会这样把部署过程和真实使用中的反馈引入模型以便持续改进”等等——于是你就有了一百个小飞轮在各自转动。相反,你可以把资源投入到一个中心化的模型训练上,而产出的这个东西能在你已经拥有的海量应用中被广泛使用,同时还打开了去构建一大批过去不可能的新事物的可能性。

So I want to push on this even just a little bit more, this platform idea. What I am hearing from you, and maybe I’m not hearing from you, I already thought this before I talked to you, but there are platforms like Windows being a platform and that’s what we think of a platform. You have APIs and there’s network effects, it’s a two-sided network, you have developers and users on the one side, but then there are platforms like x86 as a platform.
所以我还想再多追问一下这个“平台”的概念。从你说的、也可能是我自己原本就这么想的——我们通常说的那类平台,比如 Windows,就是我们理解的平台形态:你有 APIs,有网络效应,是一个双边网络,一边是开发者一边是用户;但同时也存在另一类平台,比如把 x86 看作一种平台。

KS: Yep.
KS:对。

Am I right to think of your use of the word platform as being closer to x86 as opposed to Windows?
我这样理解对吗:你所说的“platform”更接近 x86,而不是 Windows?

KS: Yeah, I think that’s probably right.
KS:是的,我认为大概就是这样。

Or maybe the better example is processing in general, because when you talk about going from specialized to general use, that sounds like the shift from dedicated processors that did one thing to generalized logic chips that were broadly programmable.
或者,更好的例子也许是“处理器”这个整体范畴:当你谈到从专用到通用时,这听起来就像是从只做一件事的专用处理器,转向广泛可编程的通用逻辑芯片的转变。

KS: Yeah and look, I think x86 is probably a pretty apt comparison here, because the thing that made x86 interesting is it was a general purpose piece of infrastructure that allowed lots and lots and lots of software to be written, and the power of the system, the platform, just increase over time because it was getting cheaper every 18 months or so and more powerful simultaneously. And so you just had this rapid progression of capability flowing into the hands of lots of people that were building things on top of it.
KS:对,我认为 x86 在这里可能是一个相当贴切的类比。x86 的有趣之处在于,它是一种通用的基础设施,使得无数软件可以在其之上被编写;而且这个系统(这个平台)的能力会随时间不断提升——大约每 18 个月就变得更便宜,同时也更强大。于是,能力就快速地流向了很多在其之上进行构建的人手中。

There was a clear separation between the x86 and the operating system and the PC manufacturers and the people who are building applications on top of it. And sometimes, Microsoft built both applications and operating systems so there’s a little bit of the both, but there was a whole universe of possibility there for people to do things on top of the Wintel platform that had nothing to do with Microsoft predicting what all of the useful things were and people could trust that it was an interesting platform because you had this exponential called Moore’s Law that was just going to ultimately result in the thing being completely ubiquitous.
在 x86 与操作系统、PC 厂商,以及在其之上构建应用的开发者之间,存在明确的分层。有时 Microsoft 同时构建应用与操作系统,所以会有一些重叠,但在 Wintel 平台之上存在一个巨大的可能性宇宙,供人们去做各种事情,这并不取决于 Microsoft 事先预测哪些东西有用。人们之所以信任这是一个有趣的平台,是因为存在一个被称为 Moore’s Law(摩尔定律)的指数规律,它最终会让这一切无处不在。

Right. We’ll get to the Moore’s Law point, I know that’s one that both you and Satya have been hitting a lot and you want to get to, but you mentioned Wintel there. The way it turned out with x86 is in the fullness of time, you had Windows and you had Linux and you actually eventually had even Macs or whatever, so you did have layers, but by and large, from a developer perspective, their level of abstraction that they cared about was the operating system layer.
对。我们稍后会谈 Moore’s Law,我知道你和 Satya 一直在强调它,也确实想谈它。但你刚才提到了 Wintel。事实证明,随着时间推移,围绕 x86 出现了 Windows、Linux,甚至后来还有 Macs 等等,于是确实形成了分层。不过总体而言,从开发者的角度看,他们最在意的抽象层级主要是操作系统层。

With AI models, my question is where is the actual opportunity going to arrive? So let’s back up. I think an interesting thing with Nvidia right now is obviously, there’s lots of reasons to be bullish for Nvidia for secular reasons, but I think there’s a structural reason to be concerned, which is that the CUDA layer, that’s where we are all specialized and we’re doing frameworks for this and frameworks for that. The LLM’s generalized that, and now actually, there’s stuff happening at a higher level where you don’t need to know CUDA to build an AI application. But is that the actual layer or will there be an operating system that sits above that?
对于 AI 模型,我的问题是:真正的机会将会出现在哪一层?我们先倒回去说。我认为 Nvidia 现在有个很有意思的现象:显然有很多长期因素让人看好 Nvidia,但我也认为存在一个结构性担忧——CUDA 这一层是大家专业化的地方,我们为此做了各种框架。LLM(大型语言模型)把那一层给“泛化”了;现在实际上在更高一层已经发生了很多事情,你并不需要了解 CUDA 就能构建一个 AI 应用。但那一层就是最终的分层吗?还是说在其之上还会出现一个类似“操作系统”的层?

KS: Hard to say. Probably not an operating system in the sense that—
KS:很难说。至少不会是严格意义上的操作系统——

Not a traditional operating system, but that sort of context.
不是传统意义上的操作系统,但大致是那个层次。

KS: Yeah. Look, I think this is the history of computers writ large, the level of abstraction that people always increases.
KS:对。我认为这就是计算机发展史的宏观叙事——人们所使用的抽象层级总是在不断提高。

And we’re just resetting because of this new model.
而我们只是因为这种新模型而在重置。

KS: Yes, a hundred percent, I think that’s absolutely true. So I don’t know exactly what the level of abstraction is going to be, but it’s already very different now. We have prompt engineers now who are able to coax these systems into doing very complicated things simply by instructing it in natural language, like what you would like the system to do or not to do. We’re developing all sorts of tools for figuring out, you’ve got to stuff a bunch of stuff into a context window for a large language model in order for it to do what you want to do. We built things at Microsoft Research like this GraphRAG system that does a graph structured context composition, so that you’re very efficiently using the context and you’re not sending unnecessary tokens into the model, which is a good thing because the more tokens you send, the more it costs and the higher the latency is where you just need to get at the information that it needs to have in order to answer the question you need answered or to complete the task you need completed.
KS:是的,百分之百,我认为这完全正确。所以我并不确切知道未来的抽象层级会怎样,但现在已经非常不同了。我们有了“提示工程师(prompt engineers)”,他们只需用自然语言指示系统该做或不该做什么,就能让这些系统完成非常复杂的事情。我们正在开发各种工具来解决这样的问题:你必须把大量信息塞进大型语言模型的上下文窗口,才能让它按你的意图工作。我们在 Microsoft Research 构建了像 GraphRAG 这样的系统,做图结构的上下文组合,这样你就能非常高效地利用上下文,不把不必要的 tokens 发送进模型——这很好,因为发送的 tokens 越多,成本越高、时延越大;而你真正需要的,是让模型获得它回答你想要解答的问题或完成你需要完成的任务所必需的信息。

So I don’t know what the full set of abstractions are, so why we talk about this notion of a Copilot stack, it’s not like we’ve even figured out exactly what everything in that Copilot stack is. We figured a ton out over the past couple of years that you have to have in order to deploy a modern application, but even as the models become more powerful, the abstractions get higher. So going back to your Windows analogy, the first version of Windows didn’t have DirectX in it because the graphics weren’t powerful enough in those machines for you to even contemplate having shaders.
因此我并不知道完整的抽象集合究竟是什么——这也是我们谈“Copilot 堆栈(Copilot stack)”这个概念的原因;我们并没有把这个堆栈里的所有要素都完全搞清楚。过去几年里,我们找到了部署现代应用所必需的一大堆组件;但即便模型越来越强,抽象层级还会继续上移。回到你对 Windows 的类比:最初版本的 Windows 并没有包含 DirectX,因为当时那些机器的图形能力还不足以让你去设想使用“着色器(shaders)”。

It’s not that no one created it, it’s that no one had even thought about it yet.
不是没人去创造它,而是当时压根还没有人想到它。

KS: Right, so a bunch of that is still yet to come here. But I think what you will see from us at least is we’re going to have at least one opinion, and I don’t know whether it will be the ultimate opinion or the right opinion, but it will be an opinion about what all of those components are that you need to have in addition to a powerful frontier model in order to build these really rich, interesting, new applications.
KS:没错,所以很多东西还未到来。但我想你至少会看到,我们会给出至少一个“看法”。我不知道那是否会是最终的答案,或是否是正确的答案,但那会是一个关于:除了一个强大的前沿模型之外,要构建这些非常丰富、有趣、全新的应用,还需要哪些组件的整体“看法”。

Well, what is that opinion? Is it the Copilot stack articulation? In the long run, in this opinion, if you’re a developer, if you’re a developer thinking about building a computer application in 1975 versus 1985 versus 1995, your decision set is completely different. It’s very obvious in some respects what you’re going to do, so what is your opinion on how that evolution will happen?
那么,这个“看法”究竟是什么?是不是对 Copilot stack(Copilot 堆栈)的阐释?从长远看,在这一看法下,如果你是开发者——在 1975 年、1985 年、1995 年分别打算构建计算机应用时——你的决策集合完全不同。在某些方面你该做什么非常清楚。那么,你对这场演进将如何发生的观点是什么?

KS: Well, so I think the opinion that we have right now is as you do when all of these platforms emerge, you’ve got a layering of these abstractions. So you have at the bottom of the abstraction stack, you’ve got a large foundation model, then you have your data set and a retrieval mechanism that, in a very careful way, makes sure that the model has access to the information that it needs to have access to in order to complete the task. You’ve got a bunch of stuff that’s sitting on top of that that is doing orchestration, so it may be that you need to operate across multiple models in order to accomplish the thing you want to accomplish, and you may need to do that for cost reasons or for quality reasons or for data privacy reasons.
KS:嗯,我认为我们目前的看法是:就像任何平台出现时一样,你会得到一种抽象层的分层结构。抽象栈的最底层是一个大型的 foundation model(基础模型);然后是你的数据集与检索机制,要以非常谨慎的方式确保模型能获取它完成任务所必需的信息。在其之上还有一堆做 orchestration(编排)的东西——也就是说,你可能需要跨多个模型协同运行才能达成目标;这样做可能出于成本、质量或数据隐私等方面的考虑。

One of the things that we really expect to see over the coming year is a decomposition of where inference happens. So some of it will absolutely be able to happen on a device, like on your PC or on your phone, and if you can do that, I think you want as much of that inference, as much of the AI happening on the device as humanly possible and only when you run out of capability or capacity on the device do you have to go call the more powerful, complicated thing that’s sitting inside of the cloud.
我们非常期待在未来一年里看到的一件事,是对“推理发生在哪里”的进一步分解。部分推理完全可以在设备端进行,比如在你的 PC 或手机上。如果可以做到这一点,我认为你应尽可能让更多的推理、更多的 AI 在设备端完成;只有当设备的能力或容量用尽时,才需要调用云端那个更强大、更复杂的东西。
Warning
白痴,如果有可能的话,大家都想要更好、最好的答案。
Is the most important agent to be built this orchestration agent that figures out number one, “Do we go local?”, “Do we go to the cloud?” — number two, if we go to the cloud, “How do we rewrite this prompt or request?”? To your point you said before, so we minimize the number of tokens, it’s super efficient. Is there a bit where you’re talking about the Phi models on Windows, and actually the most important part of those is not that you can do a co-drawing thing on Windows, but you can actually keep your COGS down for Copilot in the cloud.
最重要要构建的 agent 是否就是这个 orchestration agent(编排代理)?它首先判断:“我们在本地跑,还是上云?”;其次,如果上云,“我们该如何重写这个 prompt 或请求?”——就像你之前所说的那样,为了把 tokens 数量降到最少、让效率更高。关于 Windows 上的 Phi models(Phi 模型),是否某种意义上它们最重要的并不是在 Windows 上能做协同绘图之类的事情,而是它们实际上能帮助你把云端 Copilot 的 COGS 压下来?

KS: I think it’s an important thing and there’s just an increasing number of abstractions. The important thing is, if you have a truly useful thing that you are making where your audience is very large, you want to be able to distribute it to as many of those people in that audience as humanly possible, and so COGS is definitely a factor in that and so if there are ways that you can get them a good high quality product where you’re doing something like cloud offload to a small model in order to deliver that application, that’s great. You should absolutely be doing that.
KS:我认为这是件重要的事,而且抽象层级只会越来越多。关键在于:如果你正在打造的东西确实有用、受众非常大,你就希望尽可能把它分发给受众中的尽可能多的人。因此 COGS 肯定是一个因素。如果有办法在交付应用时,通过类似“cloud offload(云端卸载)到小模型”这样的做法,给用户提供一个高质量的产品,那就太好了——你绝对应该这么做。

But I thought you told us we shouldn’t be worrying about COGS, because everything’s going to be very cheap soon.
可我记得你说过我们不需要担心 COGS,因为一切很快都会变得非常便宜。

KS: It will!
KS:会的!

AI Scaling

AI 规模化

Here’s your pitch. This is the segue for you to give me your scaling pitch.
现在请你来阐述一下你的观点。这是一个过渡,邀请你展开谈“规模化”的论点。

KS: Yeah, and I give this pitch all the time. So the interesting thing I think about this whole space right now is you do have exponentially improving capability in the frontier models and I don’t think we have approached the point of diminishing marginal return on the scale.
KS:是的,而且我经常这样陈述。关于当下整个领域,有一个有趣的现象:前沿模型的能力在指数式提升;而且我认为我们距离“规模扩张的边际回报递减点”还没有到来。

If we hit a scale wall, what is it going to be? Is it going to be data or what do you think?
如果我们真的撞上规模的天花板,会是什么导致的?会是数据吗?或者你怎么看?

KS: Well, I think data’s already hard. At the scale of some of the frontier models, I think, and everybody who sort of runs into this, it’s a challenge to have enough data to feed it. I mean, one of the things with the Phi models that was a big innovation is you actually use a frontier model to generate-
KS:我觉得数据已经很难了。在一些前沿模型的规模上,要有足够的数据“喂”进去,本身就是个挑战,大家都会遇到。比如,Phi models(Phi 模型)的一项重要创新就是:你实际上用一个前沿模型来生成——

Synthetic data.
合成数据。

KS: And we’ve been doing this for years, we used to do it for the non-generative models. So if you wanted to build a computer vision classifier model and you wanted to make sure that it was trained to not have biases reflected through from underlying training data sets, you could go use a generative model to generate a whole bunch of synthetic data to get a fair distribution of training data so you could get model performance that you wanted. So I think that’s an increasingly powerful way for people to build both small models and large models, is generating synthetic data. Particularly for reinforcement learning, I think it’s really valuable.
KS:我们已经做了很多年了,过去在非生成式模型上就这么干。如果你想做一个计算机视觉的分类器模型,并且希望它的训练不受底层训练数据集偏见的影响,你就可以用生成式模型去生成大量合成数据,用来获得一个更公平的训练数据分布,从而得到你想要的模型表现。所以,我认为“生成合成数据”正成为同时构建小模型与大模型的越来越有力的方法。尤其对强化学习来说,我觉得它非常有价值。

To this scaling bit, what is driving it? You’ve mentioned the foundation models and you used your analogy with Sam Altman on stage of like we started with the, what was the smaller animal?
回到这个规模化的话题,是什么在驱动它?你提到了基础模型,还用过你和 Sam Altman 在台上的类比:我们一开始用的是——那个更小的动物是什么来着?

KS: A shark.
KS:鲨鱼。

Shark, and then the orca, and now the blue whale is training as yet unnamed model, which apparently will be released at some point. So is the answer that a lot of the efficiencies and a lot of the scaling capabilities is all in smaller models because these big models can generate all the synthetic data, can provide all the — and you can optimize, but that doesn’t answer the question for the foundation models. What are you confident their scaling is?
鲨鱼,然后是虎鲸,而现在“蓝鲸”正在训练一个尚未命名的模型,显然会在某个时间点发布。那么,答案是否是:大量效率与规模化能力其实都在小模型上,因为这些大模型可以生成所有的合成数据,提供所有这些——然后你再去优化;但这并没有回答基础模型本身的规模化问题。关于基础模型,你对它们的“可扩展性”到底有多大把握?

KS: It is sort of hard to completely answer that question without giving away a whole bunch of things I prefer not to give away.
KS:如果不披露一大堆我不愿公开的内容,要完全回答这个问题确实有点难。

That’ll be a sufficient answer I think.
我觉得这个回答已经足够了。

KS: But look, I do think that the synthetic data is useful for training the large foundation models as well, so just imagine if you want to train a foundation model to be really, really, really good at coding, there are plenty of ways to generate synthetic programs that have particular characteristics and because programs are these deterministic entities, you can sort of generate something synthetically and you can run it through something like a model checker, to prove, “Is it compilable?”, “Does it produce a set of outputs?”, “Is it a valid input that you can put into a training process?”. You can literally design a training curriculum or at least a partial training curriculum for any model, by generating synthetic data in these domains where it’s pretty straightforward to generate well-formed training inputs to get the model to be better at that particular curriculum you’re trying to train them on. Now it doesn’t mean that all the data can be synthetic.
KS:但我确实认为,合成数据对训练大型基础模型同样有用。设想一下,如果你想把一个基础模型训练得在编码方面“非常非常非常好”,有很多方法可以生成具有特定特征的合成程序;而且程序本身是确定性的实体,你可以合成一些东西,然后用类似 model checker(模型检查器)之类的工具跑一下,去验证“它能否编译?”、“它是否产生了一组输出?”、“它是不是一个可以放进训练流程的有效输入?”。在那些容易生成良构训练输入的领域,你真的可以为任何模型设计一套训练课程,或者至少设计一部分训练课程,通过生成合成数据来让模型在你要训练的那条“课程轨道”上变得更好。当然,这并不意味着所有数据都可以是合成的。

You used an example before of models doing click-through rates prediction, ad targeting, you’re talking about coding. The benefit of all these is although you’re producing a probabilistic model, you’re using data that is deterministic, right?
你之前举过例子:模型做点击率预测、广告投放,现在又谈到编码。这些场景的好处在于:虽然产出的是一个概率模型,但你使用的数据是确定性的,对吗?

KS: Correct.
KS:没错。

So when you get to this generalizable function, what gives you confidence that the generalizability can extend to domains where it’s almost like on one extreme you have pure creativity where there is no wrong answer, works well there, there’s the other extreme where you’re operating in a domain with a validation function so that you can actually bring AI to bear in a parallel fashion and get the best answer, because you can grade it. But then there’s a whole middle area where I think people — I call it the lazy function — people want AI to do their jobs for them, but the reality is the problem is, there isn’t necessarily a grader in place. So can it generalize to that?
那么当我们谈“可泛化的能力”时,你对它的信心来自哪里?这能否扩展到那些领域?一端几乎是纯创意,没有标准答案——AI 在那里运作良好;另一端是带有验证函数的领域,你可以让 AI 以并行方式工作并得到最佳答案,因为你能对其评分。可问题在于,中间还有一大块领域——我称之为“懒惰函数”——人们希望 AI 替他们完成工作,但现实情况是:那里并不一定存在一个现成的“评分器”。那么它能泛化到那一块吗?

KS: Yeah, look, I think we’re going to be able to generalize a lot of things. So one of the things that we wrote about in the Phi paper, which is I think is titled Textbooks Are All You Need, is that, and this is the thing that gives me confidence by the way, is we have the ability to train expert humans on pretty finite curriculum to be able to do very expert things.
KS:是的,我认为我们能把很多事情泛化开来。我们在 Phi 论文中提到的一点——我记得题为 Textbooks Are All You Need(“教材即一切”)——正是让我有信心的原因:我们有能力用相当有限的课程体系去训练人类专家,使其能够胜任非常专业的工作。

Like it’s how I was trained as a computer scientist, I read a whole bunch of computer science papers and computer science textbooks and did a whole bunch of problem sets and I practiced, practiced, practiced, and then after some number of years, I was competent enough to actually do something useful in the world. So I think that is the thing that gives me confidence that we will be able to figure out how to generate enough of a curriculum for these models and to find a learning function that will let us build things that are cognitively quite capable.
就像我作为一名计算机科学家的训练过程:我阅读了大量计算机科学论文和教材,做了大量习题,并且不断练习、练习、再练习。若干年之后,我才具备了足够的能力,能够在现实世界中做出一些有用的事情。所以,这正是让我有信心的地方:我们将能够为这些模型设计出足够的课程体系,并找到一种学习函数(learning function),使我们能够构建在认知上相当有能力的系统。

Now the thing that I don’t know, and this is going to be sort of an interesting thing we will figure out, I think soon, is — it’s a bet I’ve got with a bunch of people — so I would imagine that a computer will prove the Riemann hypothesis before a mathematician will. For your listeners, the Riemann hypothesis is one of these century-old problems in mathematics that I think \[David] Hilbert proposed as one of his grand challenges at the end of the 19th or the early 20th century and people have been pounding away at this thing. The Riemann hypothesis is basically a statement about what the distribution of the prime numbers are and it’s a hard, hard problem. It’s one of those things that’s easy to state and there’ve been just crazy, brilliant people trying to come up with a proof of this for a very long time now and so I actually believe that it’s one of those problems that is likely to be incredibly complex, where the proof is going to be just mind-boggling and my prediction is a computer will be able to do it before a human being will be able to, and will probably involve human assistance. It won’t be a totally autonomous.
至于我尚不确定的一点——我认为我们很快就会搞明白,这也会很有趣——我和不少人打了个赌:我猜想计算机会先于数学家证明黎曼猜想(Riemann hypothesis)。给你的听众做个背景介绍,Riemann hypothesis 是数学中一个延续了一个世纪的问题,我记得是由 \[David] Hilbert 在 19 世纪末或 20 世纪初将其列为宏大挑战之一,许多人一直在攻克它。Riemann hypothesis 基本上是关于素数分布的一个命题,这是一个非常非常难的问题。它属于那种表述起来容易、但长期以来许多极其聪明的人都在试图给出证明却始终未果的问题。所以我实际上相信,它很可能是那类极其复杂的问题之一,其证明会让人叹为观止。我的预测是,计算机会先于人类完成证明,而且大概率会有人类辅助参与——它不会是完全自主完成的。

AI as Tool

AI 作为工具

Well, on that human assistance sort of point. You said in your keynote today, you’ve loved tools your whole life.
嗯,关于人类辅助这件事。你在今天的主题演讲中说过,你一生都热爱工具。

KS: Yeah.
KS:是的。

And is AI going to remain a tool, it’s clearly a tool today.
那么 AI 会一直保持为一种工具吗?它如今显然是一种工具。

KS: Yes, I think so.
KS:是的,我这样认为。

Why is that? Why is it not going to be something that is sort of more autonomous?
为什么?为什么它不会演变成某种更加自主的东西?

KS: Yeah, I don’t really see…
KS:是的,我不太这么看……

None of us know.
我们谁也不知道。

KS: Yeah, well, so none of us know, but I do think we’ve got a lot of clues about what it is humans are going to want. So there hasn’t been a human being since 1997 when Deep Blue beat Gary Kasparov at chess, better than a computer at playing chess and yet people could care less about two computers playing each other at chess, what people care about is people playing each other at chess and chess has become a bigger pastime, like a sport even we make movies about it. People know who Magnus Carlsen is.
KS:对,嗯,我们谁也不知道,但我确实认为我们已经有许多线索,能看出人类会想要什么。自从 1997 年 Deep Blue 在国际象棋上战胜 Gary Kasparov 之后,就再也没有人类能在下棋方面胜过计算机;然而,人们并不在乎两台计算机彼此下棋,人们在乎的是人与人之间的对弈,而且国际象棋反而成为更大的消遣,几乎像一项运动,我们甚至为它拍电影。人们都知道 Magnus Carlsen 是谁。

So is there a view of, maybe the AI will take over, but we won’t even care because we’ll just be caring about other humans?
那么是否可以这样看:也许 AI 会“接管”,但我们甚至不会在意,因为我们只会关心其他人类?

KS: I don’t think the AI is going to take over anything, I think it is going to continue to be a tool that we will use to make things for one another, to serve one another, to do valuable things for one another and I think we will be extremely disinterested in things where there aren’t humans in the loop.
KS:我不认为 AI 会接管任何事物。我认为它会继续作为一种工具,被我们用来为彼此创造、彼此服务、彼此做有价值的事情。而且我认为,只要缺少“人类在环(humans in the loop)”的事物,我们都会极度缺乏兴趣。

I think what we all seek is meaning and connection and we want to do things for each other and I think we have an enormous opportunity here with these tools to do more of all of those things in slightly different ways. But I’m not worried that we somehow lose our sense of place or purpose.
我认为我们追求的是意义与连接,我们希望为彼此做事。我认为借助这些工具,我们有巨大的机会以稍微不同的方式去做更多这些事情。但我并不担心我们会因此失去自身的定位或目标感。

How is AI going to make life better in Virginia where you grew up?
AI 将如何让你成长的弗吉尼亚州的生活更美好?

KS: The story I told on stage this morning about my mom, I think she had a pretty rough go of it health wise last fall and you look at the demographics of the world right now, we have a rapidly aging population and—
KS:就像我今天上午在台上讲到我母亲的那个故事。我觉得她去年秋天在健康方面经历得相当艰难。放眼当下的世界人口结构,我们正面临快速老龄化的人口,而且——

A shrinking population.
人口在收缩。

KS: A shrinking population in many places. So shrinking in China, shrinking in Italy, shrinking in Japan, I think Germany has tipped over to shrinking, China’s population is shrinking. You can go look at when we hit peak population in a bunch of places. I think France will hit it sometime in the early 2030s, ex-immigration like the United States would already have a shrinking population so none of us have lived in a world where we’ve had population decline in our lifetime, and so the thing that must happen in order for us to maintain our standard of living, and for us, God forbid, to have a better standard of living over time when you have fewer people to do the work of the world is you have to have big productivity boosts. There has to be some way to, with fewer human beings, do all of the things that need to get done.
KS:在许多地方人口都在收缩:中国在收缩,意大利在收缩,日本在收缩,我认为德国已经转为收缩,中国的人口正在收缩。你可以去看看很多地方何时达到人口峰值。我认为法国会在 2030 年代初的某个时候达到峰值;如果不计移民因素,美国也已经是人口收缩。因此,在我们的一生中,其实没人真正经历过一个人口下降的世界。要想在劳动力更少的情况下维持我们的生活水平,甚至(但愿如此)随着时间提高生活水平,就必须有巨大的生产率提升。必须找到一种办法,让更少的人也能把需要完成的事都做完。

There are places in rural America where there are canaries in the coal mine for this problem that we’re all going to face at some point where you don’t have doctors lining up to go move to Gladys, Virginia to take care of the rapidly aging population there. So I think the way that AI shows up in those places is it lets people have equitable access to all of the things that you need access to have dignity and live a good life, and you don’t have to rely on how to get the humans to the right places in order to do it when you don’t have enough humans to go around. I know all of that sounds super abstract, some far distant problem.
在美国农村的一些地方,已经出现了这个问题的“煤矿金丝雀”式预警:你不会有医生排着队搬去弗吉尼亚州的 Gladys 去照料那里的快速老龄化人群。因此,我认为 AI 在那些地方的作用,是让人们能够公平地获得维持尊严、过上好生活所需的一切服务,而当人手不足时,你不必依赖“如何把人力送到对的地方”这种方式来实现这一点。我知道这些听起来很抽象,像是一个很遥远的问题。

I’m from a small town in Wisconsin, I know exactly what you’re talking about.
我来自威斯康星州的一个小镇,我完全明白你在说什么。

KS: Yeah, it’s a real thing, and I think about this healthcare crisis that my mom got into, and I think everybody in the system there was trying to do their absolute best and the absolute best still wasn’t good enough. I think if I hadn’t intervened, she would’ve had a really different outcome and I think about all of the old ladies who don’t have a son who can intervene, and if AI can help play some role in that intervention to let people have more agency over their healthcare, more agency over their education, more agency over their entrepreneurial opportunities, I think it’s nothing but goodness. That doesn’t mean it’s unalloyed good and we get to not think at all about the risks and the downsides.
KS:是的,这是真实存在的问题。我想到我母亲遭遇的那场医疗危机,我觉得系统里每个人都在尽其所能,但即使尽了最大努力,仍然不够。如果当时我没有介入,她的结果可能会完全不同。我也会想到那些没有儿子能出面干预的老人们——如果 AI 能在这种干预中发挥某种作用,让人们对自己的医疗有更多自主权、对教育有更多自主权、对创业机会有更多自主权,我认为那只会是好事。这并不意味着它是没有杂质的全然之善,我们就可以完全不去思考其中的风险和负面影响。

I think the risk in general, from my perception, particularly amongst our circles as it were, unlike a lot of other technical revolutions, there is insufficient thought being given to the upsides. There’s this like, you felt you were talking about all these good things and you’re like, “Oh, I better include the safety/security bit”. You go to anyone outside of this area and they’re like, “Oh, we know about the upsides” — that’s the part that gets waved away. It’s like, “No, wait, can we stop on that for a little bit? Can we actually talk through what those are?”, so I enjoy your articulation of that there.
我认为总体上的风险——在我看来,尤其是在我们这一圈人当中——与许多其他技术革命不同,大家对“正面收益”的思考是不足的。常常是这样:你觉得自己在谈一堆好处,然后心想,“哦,我最好再加上安全/安保那一块。”可你去找这个领域之外的人,他们会说,“哦,我们知道那些好处”——这部分反而被轻描淡写带过了。就像,“不,等等,我们能不能在这上面多停留一会儿?我们能不能真正把那些好处逐一谈清楚?”所以我很欣赏你刚才对这些好处的具体表述。

KS: I do think there is another, at least one, technological revolution that had this sort of property and it’s the Print Revolution where you had the printing press.
KS:我确实认为还有另一次、至少一次具有类似特征的技术革命,那就是 Print Revolution,当时出现了印刷机。

The Church — it took 10, 15 years, but they caught up in the grand scheme of time pretty quickly.
The Church——花了 10 到 15 年,但从更长的时间尺度看,他们其实追赶得相当快。

KS: Yeah, actually, it took longer than that, you had about a century of turmoil and upheaval and what you netted out to in the end was a thing that we all just absolutely take for granted. You just can’t imagine a world without the written word, without books and free flow of information.
KS:对,不过事实上比那更久——随后经历了大约一个世纪的动荡与剧变,而最终我们得到的,是一种如今被我们视为理所当然的状态。你几乎无法想象一个没有文字、没有书籍、没有信息自由流动的世界。

We also ended up with a completely re-organized Westphalian system. We had the years of war, we had lots of stuff. We had the breakup of the entire Reformation, a lot happened?
我们最终还迎来了一个被彻底重组的 Westphalian system。我们经历了多年战争,发生了很多事情。整个 Reformation 的分裂也出现了,发生的事太多了,对吧?

KS: Yeah, my wife is an early modern European historian by training, she’s a philanthropist now, but that is her period, the Print Revolution, and so this is part of our household conversation.
KS:是的。我的妻子受训是研究早期近代欧洲史的历史学家,现在是位慈善家,但她的研究阶段正是 Print Revolution,所以这也是我们家常常讨论的话题。

I’m going to give you this mic when you go over, if you can just record a couple episodes, we’ll post them for you.
等你过去的时候我把这个麦克风交给你,如果你能顺便录几期节目,我们就帮你发布。

The OpenAI Partnership

OpenAI 合作关系

Was it because you were out an outsider, you had been at Google, you were at LinkedIn, that you could come to Microsoft and say, “I’m not sure you realize how far behind you are from Google and you need to do something pretty radical here”?
是否因为你是一个外来者,你之前在 Google、在 LinkedIn,所以你来到 Microsoft 时可以直言:“我不确定你们是否意识到自己落后于 Google 多远,你们需要在这里做一些相当激进的事情”?

KS: Maybe. I think there was actually a recognition that we were far behind.
KS:也许吧。我认为当时确实已经有一种共识,认为我们落后很多。

Broadly speaking, you didn’t need to convince anyone?
总体来说,你并不需要说服任何人?

KS: Yeah. The question was, “What do you go do about it?” and I think the interesting thing there was I’ve always been attracted to problems.
KS:对。问题在于,“你打算如何应对?” 而有趣的是,我一直被问题本身所吸引。

You need a place to use your tools.
你需要一个施展你这些工具的地方。

KS: Yeah, I do. It’s funny, I’ve been an engineer for a really long time, and the thing that you will notice is you have all sorts of different types of problem solvers. So you have people who are good starters and people who are good finishers and it is very rare to have someone who’s a good starter and a good finisher. The choices that I made in things that I went to go work on were largely about — it was almost like Nanny McPhee, I don’t know whether you ever saw that movie.
KS:对,我需要。很有意思,我当工程师已经很久了,你会发现解决问题的人有各种类型:有的人擅长开局,有的人擅长收尾,而兼具开局与收尾能力的人非常罕见。我选择去做哪些事情,很大程度上是——这几乎就像 Nanny McPhee,我不知道你有没有看过那部电影。

I did not. Sorry, it went over my head.
我没看过。抱歉,我没领会到你的意思。

KS: Nanny McPhee was this fictional nanny character who was, I’m not saying I’m magical, but her shtick was when the kids needed her but didn’t want her, she had to stay, and when they no longer needed her but wanted her to stay, she had to go. That’s the thing that attracts me to things. It’s like, “Okay, this is a really sticky situation. I think I can help solve this particular problem”, that’s what I want to go work on.
KS:Nanny McPhee 是一个虚构的保姆角色——我不是在说我有魔法——她的“规矩”是:当孩子们需要她但不想要她时,她必须留下;当孩子们不再需要她却想让她留下时,她必须离开。这正是吸引我去做事的点。就像:“好吧,这是一个相当棘手的局面。我认为我能帮助解决这个特定问题”,这就是我想投入去做的。

And so Microsoft being behind, you were rubbing your hands together?
所以当时 Microsoft 落后,你就开始摩拳擦掌了?

KS: Look, it wasn’t that the whole company — Microsoft was fine, they were roaring through growth and cloud. This was about, “okay, we’re behind in AI”, and AI wasn’t as obviously important in 2017.
KS:看,并不是整个公司都不行——Microsoft 状况不错,在增长与云业务上高歌猛进。问题在于:“好吧,我们在 AI 上落后了”,而且在 2017 年,AI 的重要性还没有像今天这样显而易见。

Is this more function of their product mix? For example, maybe if Bing had been bigger and they had a larger advertising business, they would’ve been more — or was it an oversight? What drove them?
这更多是他们产品组合的结果吗?例如,如果 Bing 规模更大、广告业务更庞大,他们会不会更——还是说这是一个疏忽?是什么驱动了他们?

KS: Hard to say. I think one of the things that is just super clear about investing in AI right now is you have to be disciplined about how you’re investing. It’s not one of those areas where you want to let a thousand flowers bloom and you spread your resources across a bunch of different bets and all of that’s going to add up to something great in the end.
KS:很难说。我认为当下投资 AI 有一件事非常清楚:你必须对投资方式保持克制与纪律。这不是一个你愿意让“百花齐放”、把资源摊在一堆不同赌注上、指望最后能凑成一件伟大事情的领域。

So I think Microsoft had been spending quite a substantial amount of money and had a huge number of people working on AI, but it was really diffused across a bunch of different things, and it’s just too expensive and too complicated in enterprise to let it be diffused across a bunch of different things. I think that’s a thing that still people struggle with.
所以我认为 Microsoft 确实投入了相当可观的资金,也有大量人力在做 AI,但这些投入过于分散在许多不同的方向上。而在企业环境中,让投入分散在许多不同事情上,成本太高、复杂度太大。我觉得这仍是很多人正在挣扎的问题。

So how did you convince Microsoft to say, “Look, you were spending all this money, we get it. You have Microsoft Research, XYZ. Actually, what you just need to do, your core capability, Microsoft, is the ability to spend money and there is this organization in OpenAI that doesn’t have money, but has the capability to build what needs to be done and we have to work together”?
那么你是如何说服 Microsoft 的?像这样说:“看,你们花了这么多钱,我们懂。你们有 Microsoft Research、XYZ。事实上,你们现在需要做的、你们的核心能力,Microsoft,是花钱的能力;而 OpenAI 有一个组织,它没有钱,但有能力构建需要做的东西,我们必须合作”?

KS: Well, I would challenge that characterization. I don’t think our core capability is spending money. I think our core capability, if you just look at the DNA of the company, is building platforms to try to identify opportunities to build things that lots of other people are going to build their businesses and their products on top of.
KS:嗯,我会质疑这种描述。我不认为我们的核心能力是花钱。我认为我们的核心能力——如果你看看公司的 DNA——是构建平台,去识别机会,去构建那些能让许多其他人把他们的业务与产品建立其上的东西。

Fair enough. Which throws off a lot of money that you’re able to spend.
说得通。而这也带来了大量你们可以支配的资金。

KS: Yes, I think in success that is true. So the argument was basically almost exactly the one that we’ve been having so far. It’s like, “Hey, we now are seeing a trend in the technology where it’s behaving like a platform where the platform itself is going to really benefit from focus and having a point of view about what’s the thing that you want to put your dollars into”. Not just your dollars, but you’ve got all of this opportunity cost that you’re spending on the development of this new platform.
KS:是的,我认为在成功的情况下确实如此。基本上的论点与我们刚才一直在讨论的几乎完全一样。也就是说,“嘿,我们现在看到技术正呈现出平台化的趋势;而这个平台本身会因聚焦与明确观点(你要把资金投在哪件事上)而受益良多”。不仅是你的资金,还有你为开发这个新平台所付出的全部机会成本。

Microsoft has always defaulted though towards “Build, not buy”. In this case the question isn’t buy, it’s, “Build vs. partner”, which is an even more precarious position. What was the evidence or was there a moment that convinced the board to say, “We don’t have time to catch up”?
不过,Microsoft 一直默认倾向于“Build, not buy”(自建而非收购)。在这个案例里,问题不是“买不买”,而是“自建还是与人合作”,这甚至更为微妙。有什么证据,或出现过什么时刻,让董事会被说服去下结论:“我们已经没时间自己追赶了”?

KS: I think it’s right around the time that we did the first deal with OpenAI in 2019 so we had a pretty clear sense what the scaling laws were going to look like by then, and we knew that we had to just move immediately, and there were two or three options, and this one in my judgment and then in Satya’s judgment was going to be the fastest way to get ourselves bootstrapped and into position.
KS:我认为大概是在我们 2019 年与 OpenAI 达成第一笔交易的前后。当时我们已经相当清楚“规模律(scaling laws)”会是什么样子了,也知道必须立刻行动。我们摆在面前有两三种选项,而在我、以及 Satya 的判断里,这个选项将是让我们最快“起步(bootstrap)并占据有利位置”的方式。

The risk though is you’re putting so much in the hands of an entity you don’t control. As a major proponent of this, how stressful was November 2023?
但风险在于,你把那么多东西交到一个你无法控制的实体手里。作为这一策略的重要推动者,2023 年 11 月对你来说压力有多大?

KS: It was stressful, but look, again, the thing that I would say in general is I think Microsoft as a platform provider has actually been pretty good over the years at building really complicated things with partners. It’s not like the PC Revolution was all Microsoft, it was Microsoft plus Intel plus Nvidia with graphics cards plus an entire OEM ecosystem, so it’s rarely just the thing that we are building alone. Even Azure, Azure is only successful because we’ve got a bunch of other infrastructure like Databricks and Snowflake, and a bunch of stuff that runs on top of our cloud and that also runs on top of a bunch of other people’s clouds. So I think you really, in the modern era, if you’re really talking about these super, super large-scale platforms, you have to be reasonably good at partnering. You can’t have this thought that I’m going to do everything myself, it’s just too hard.
KS:确实有压力。但我仍要强调,从总体上看,作为一个平台提供方,Microsoft 这些年实际一直很擅长与伙伴共建非常复杂的东西。PC Revolution 并不只是 Microsoft 一家的事,而是 Microsoft 加上 Intel、带有显卡的 Nvidia,再加上一整个 OEM 生态;因此很少有事情是我们单打独斗完成的。就连 Azure 也是如此——Azure 之所以成功,是因为我们拥有 Databricks、Snowflake 等一整套其他基础设施,以及大量运行在我们云上、也运行在许多其他厂商云上的东西。所以我认为,在现代,如果你真正在谈这种超级大规模的平台,你必须在“合作”方面足够擅长。你不能指望“我什么都自己干”,那实在太难了。
Idea
社会化分工和专业化创造财富。
To go back to the abstraction question, and in this context, do you feel confident, broadly speaking, it makes you sleep at night, then beyond the fact that who owns compute runs the world, that models are ultimately going to be commoditized? And if push comes to shove, sure, you’ll have to do some work, but the Office applications could run any model, it doesn’t have to be the OpenAI model.
回到抽象层的问题,在这个语境下,你总体上是否有信心(能让你夜里睡得着):除了“谁拥有算力谁就主导世界”这一事实之外,模型最终会被商品化?如果到了关键时刻,当然你们需要做些工作,但 Office 应用可以运行任何模型,不一定非得是 OpenAI 的模型。

KS: Well, look, I think it’s less about commoditization and more about what this two-step dance is that we’re doing right now, which is like you have a frontier that is advancing pretty rapidly and I think it’s just table stakes that if you’re going to be a modern AI cloud or you’re going to build modern AI applications, you better have access to a frontier model. OpenAI is doing a brilliant job, I think, building these frontier models and making very, very good use of compute resources and then as the frontier pushes forward, you have an entire ecosystem of really, really super clever people who are figuring out how to optimize all of the bits and pieces of it.
KS:我认为这与其说是“商品化”,不如说是我们当下正在做的一种“两步走”的节奏:一方面,前沿(frontier)在快速推进;如果你要成为现代的 AI 云,或要构建现代 AI 应用,能接入一个前沿模型只是“基本门槛(table stakes)”。我认为 OpenAI 在构建这些前沿模型、并高效利用算力资源方面做得非常出色。另一方面,随着前沿继续推进,你会看到整个生态中有大量极其聪明的人在想办法优化其中方方面面的环节。

Scaling Laws

规模律

Did you feel like Phi was a real validation of your strategy, that you went from not being able to do anything to building the best small model basically in a number of years?
你是否觉得 Phi 真实验证了你的策略——从一开始几乎无能为力,到在短短几年内打造出最好的小模型?

KS: Yeah, and I think the interesting thing about Phi is not that it’s replacing anything, is that it composes well with what we already have because you can do so much with a frontier model, and I don’t want anybody getting confused. I think, again, half of my message at Build today was like, “You really need to be thinking about how fast this frontier is advancing and a real category error here is getting too caught up in this sort of linearity of all of the optimizations that everybody’s doing”.
KS:是的。而我认为 Phi 有趣之处不在于它取代了什么,而在于它能与我们已有的东西很好地组合,因为借助一个前沿模型你可以做非常多的事——我不希望大家混淆。今天在 Build 上,我的信息有一半都在强调:“你必须认真思考前沿前进的速度;真正的范畴错误,是过度沉迷于把大家正在做的各种优化看成线性的那种思路。”

Is there a bit where tech has forgotten what it was like to build on top of Moore’s Law? You go back to the 80s or 90s and it took a while to get the — you needed to build an inefficient application because you wanted to optimize for the front end of the user experience, and you just trusted that intel would solve all your problems.
是否在某种程度上,科技行业已经忘了“在 Moore’s Law 之上构建”是什么感觉?回到 80、90 年代,你常常需要先做一个并不高效的应用,因为你要优先优化用户体验的前台部分,然后你就相信 Intel 会把底层的所有问题都解决掉。

KS: Correct.
KS:没错。

Is there a bit where that’s been lost?
这种做法在某种程度上被遗忘了吗?

KS: I think so.
KS:我认为是的。

Because everyone complains about bloat, but there’s a bit where no, actually you want bloat because it will get taken care of.
因为大家都抱怨“臃肿”,但在某种意义上,其实你可以接受“臃肿”,因为后面会被处理掉。

KS: Yeah, you can go sort bloat out in arrears.
KS:对,你可以事后再把臃肿清理掉。

That’s right.
没错。

KS: You don’t want pointless bloat, but you also don’t want-
KS:你不想要毫无意义的臃肿,但你也不想——

You don’t want to over-optimize to get rid of it.
你也不想为此过度优化、把它完全抹掉。

KS: If I think about earlier in my career, one of the things that people used to pride themselves on is you write these programs and you go just sort of go to the inner loop of your critical path function like write a bunch of —
KS:回想我职业早期,大家常以此为傲:写程序时直捣关键路径函数的内层循环,写一堆——

It’s like, good, you went from 0.0002 millisecond to a 0.0001 or whatever.
就像是:很好,你把 0.0002 毫秒优化到了 0.0001 毫秒,诸如此类。

KS: There was a point in time where that mattered, where that was the difference between having a useful thing and having a piece of junk, but because you had this Moore’s Law, this exponentially improving process, if you didn’t recognize that and all you did was spend your time writing a bunch of interloop assembly language for something, you were just missing all the opportunity to write fundamentally more powerful software.
KS:确实有一段时间这很重要,决定了一个东西到底有用还是废物。但在 Moore’s Law 这种指数级改进的进程下,如果你没意识到这一点,只是一味把时间花在为某段内层循环写一堆汇编代码,那你就错过了去编写在根本层面更强大软件的全部机会。

I mean, I was a compiler optimization guy when I was in grad school and I had this friend, Todd Proebsting, who was a professor at the University of Arizona and was at Microsoft Research for a while and he had this thing called Proebsting’s Law, which was a goof on Moore’s Law and Proebsting’s Law said that the work of compiler optimization researchers double the performance of computer programs once every 18 years.
我的意思是,我读研时是做编译器优化的。我有位朋友叫 Todd Proebsting,他曾是 University of Arizona 的教授,也在 Microsoft Research 待过一阵子。他提出了一个叫 Proebsting’s Law 的东西,是对 Moore’s Law 的一个戏谑——Proebsting’s Law 的意思是:编译器优化研究者的工作,每 18 年才会让计算机程序的性能翻一番。

(laughing) Puts you in your place.
(笑)让人认清自己。

KS: Yeah, and he was kind of right. I wrote a paper about this when I was in grad school and it was a little bit worse than that, and so it is one of the reasons why I decided not to be a compiler optimization person anymore because you could go crank away on something that was very, very complicated for six months of your time and move a benchmark by 4% and in that same period of time, like the material scientists and the architects that in tow were going to make this thing twice as fast. So what are you doing? You’d be way better off trying to figure out how to harness all of that new fast that was coming rather than trying to optimize away the old slow.
KS:是的,而且他有几分道理。我在读研时写过一篇相关论文,发现情况甚至比那还糟一些,这也是我后来不再做编译器优化研究员的原因之一:你可能花六个月时间在极其复杂的问题上埋头苦干,只把一个基准提升了 4%,而在同一时期,材料科学家与架构师们已把这玩意儿的速度提升到两倍。那么你在做什么呢?与其试图把旧的“慢”优化掉,不如想办法利用即将到来的那些新的“快”。

What is driving this? You were talking about a new model coming, but you also mentioned in the GPT-4 context, “GPT-4o, a 12x decrease in costs, six times increase in speed”. Now, I think if you really dig in, GPT-4o was not as good as GPT-4 at some things, it’s good at some other things it’s been optimized for. Is it just an optimization of the model? Is this a solution and inference, you figure out new ways to approach it? What are some of the drivers of this?
是什么在推动这一切?你提到有一个新模型要来,但你也在 GPT-4 的语境里说过,“GPT-4o,成本下降 12 倍、速度提升 6 倍”。当然,我觉得如果深挖的话,GPT-4o 在某些方面不如 GPT-4,但在它被优化过的其他方面更好。这只是模型层面的优化吗?还是在训练与推理上找到了新方法?背后的驱动因素有哪些?

KS: Yeah, so I think you’ve got two fundamental things. So one is the hardware is actually getting better, so God bless them. Like Nvidia is doing tremendous work, AMD is doing good work now, we’ve got at Microsoft first-party silicon efforts underway, like a whole bunch of other people in their contexts are building their own silicon and I think we’re at this point where even though you don’t quite have a functioning Moore’s Law anymore, where smaller transistors are getting cheaper and give you more power for general-purpose compute, we are at least at the moment, still innovating enough on how to put those transistors to work for this embarrassingly parallel application that we have in AI.
KS:我认为有两个根本因素。其一是硬件确实在变好,值得为他们点赞。比如 Nvidia 做得非常出色,AMD 现在也很不错;我们在 Microsoft 也有自研芯片(first-party silicon)的项目在进行中,很多其他公司也在各自场景里做自家的芯片。我觉得当下虽然不再有严格意义上“可用”的 Moore’s Law——即更小的晶体管更便宜、同时为通用计算提供更多算力——但至少目前,我们仍在不断创新,去把这些晶体管更好地用于 AI 这种“天然高度并行(embarrassingly parallel)”的应用。

Well, there’s always more rooms for innovations. You innovate on networking, even if you don’t get it just from the transistor sides, you can get it.
当然,总还有更多创新空间。比如在网络上做创新,即便晶体管这条路给不了那么多红利,你也能从别处获得。

KS: Yeah.
KS:对。

“We’ll recreate it in the aggregate”, there’s a Moneyball reference.
“我们会在整体上把它重建出来”,这是一个来自 Moneyball 的梗。

KS: You’re getting a ton of price performance advantage from the hardware, but even more significantly than that, there’s just a ton of innovation that we’ve all been doing. Everything from how you optimize the whole system software stack to how you make use of new data types. A ton of what’s happening right now is using these faster data parallel data types, FP8 instead of doing all 32 bit arithmetic for these models, and that lets you make better use of memory to have more operations.
KS:硬件带来了大量“性能/价格”的优势,但更重要的是,大家在做的创新本身也非常多:从如何优化整条系统软件栈,到如何利用新的数据类型。现在大量实践都在采用更快的“数据并行”数值类型,比如用 FP8 取代对这些模型一切都用 32 位运算,这能让你更高效地使用内存,从而完成更多运算。

So kind of counter-intuitively, there’s a bit where you use less precision, it’s dumber to a certain extent and that actually turns out to be the better answer because price and speed matter more?
所以有点反直觉——你用更低的精度,在某种程度上“更笨”,但结果反而更好,因为价格与速度更重要?

KS: So far less precision is not making anything dumber, it’s just you look at all the activations in a neural net and they’re just super, super sparse. The networks are big, but there isn’t a ton of signal in each one of those activations.
KS:到目前为止,降低精度并不会让任何东西“更笨”。你去看神经网络的激活值,它们极其稀疏。网络很大,但每个激活里并没有太多“有效信号”。

It strikes me about just the big thing in general that this parallel approach, that is the level of abstraction everywhere. It strikes me the most compelling applications are ones that can bring parallelism to bear and this is sort of a thing here where don’t get hung up on the precision of any one calculation. If the cost is reduced parallelism, which you just need more, more and more.
让我注意到的一点是:总体而言,这种并行化思路,几乎在所有地方都构成了新的抽象层。我认为最具吸引力的应用,往往是那些能够真正利用并行性的应用;而在这里,你不要执着于某一次计算的精度。如果代价是并行度降低,那么你就需要更多、更多、再更多的并行性。

KS: Correct. So the hardware is getting better and then we’re just getting a lot better, like even faster than the hardware is getting better the techniques for training and the techniques for building inference engines are just getting tremendously better.
KS:对。硬件在变好,同时我们自身也在变得更好——甚至比硬件提速更快的是:训练技术与构建推理引擎的技术正以极快的速度进步。

Microsoft’s Spend

Microsoft 的支出

How do you feel confident about the spend that’s going into this? That’s probably the question people have. Like, oh, you’ll say, “Well, we have visibility into our revenue”, what’s that visibility? Is that Office Copilot revenue? Is that API use? What gives you confidence in knowing is it better to over-invest or is it better to not have enough compute and on the inference side, or is it better to have too much and risk going — or is that just, it’s good for everybody if you have too much compute?
对于投入到这件事上的开支,你如何建立信心?这大概是大家心里的疑问。比如,你可能会说,“我们对自己的收入有可见性(visibility)”,那这种可见性指什么?是 Office Copilot 的收入吗?是 API 的使用量吗?是什么让你有信心判断:是应该多投一些,还是宁可算力(以及在推理侧)不够用?又或者算力太多、冒着——的风险?抑或说,只要算力太多,对所有人其实都是好事?

KS: What we’re seeing right now is there’s relatively little downside in having an excess of compute, and that’s theoretical because the reality is we do not have excess compute. The demand for all of these AI products and services is so high right now that we are just doing crazy things to try to make sure that we’ve got enough compute and enough optimization of the entire system so that we can fulfill the demand that we’re seeing.
KS:我们现在看到的是,拥有过剩算力的负面影响相对较小——这是从理论上说,因为现实是我们并没有过剩的算力。眼下这些 AI 产品和服务的需求太高了,以至于我们不得不做各种疯狂的事来确保我们有足够的算力,并对整个系统做足够的优化,以满足我们所看到的需求。

If you look forward, there are just huge amounts of economic opportunity. I think the API businesses, I mean, it went from nothing to a very, very large business quicker than anything that we’ve ever seen. The Copilot business has a huge amount of traction. It was the user engagement on Copilot is the highest level of engagement we’ve seen in any new Microsoft 365 in any Office product, maybe in history. So a lot of times you will go sell somewhat a new enterprise product and then it takes a fairly long time for the product to diffuse out into the organization.
展望未来,存在巨大的经济机会。我认为 API 业务——我的意思是——从无到有,成长为一个非常非常大的业务,其速度之快前所未见。Copilot 业务的牵引力巨大。就用户参与度而言,Copilot 在任何新推出的 Microsoft 365/任何 Office 产品中的表现都是最高的,可能是有史以来最高。很多时候,你把一款相对新的企业产品卖进客户后,需要相当长时间才能在组织内部扩散开来。

It sounds like if you could spend more on CapEx you would, are you limited just by supply?
听起来如果你能在 CapEx 上花更多钱你就会这么做——你们受到的限制只是供应吗?

KS: Absolutely. We’re limited in a whole bunch of different ways, but yeah, I mean, if I could spend more-
KS:当然。我们在很多方面都受限,但是,是的,如果我能花更多——

Data center energy.
数据中心的能源。

KS: Yeah, if I could spend more on CapEx.
KS:对,如果我能在 CapEx 上花更多钱。

You have no demand concerns?
你们对需求没有担忧吗?

KS: No, not now.
KS:没有,至少现在没有。

Kevin Scott, thank you very much.
Kevin Scott,非常感谢。

KS: Thank you for having me.
KS:感谢邀请。

    热门主题

      • Recent Articles

      • 2025-08-18 Acquired.How is AI Different Than Other Technology Waves?

        Refer To:《How is AI Different Than Other Technology Waves?》。 Transcript: (disclaimer: may contain unintentionally confusing, inaccurate and/or amusing transcription errors) Ben:  Hello, Acquired listeners. We have a very special treat for you today. ...
      • 1934 证券分析.债券投资的特定筛选标准

        IV. 必须引用明确的安全标准 高级债券基本上是采用剔除程序进行挑选,所以适合运用明确的法则与标准来筛除不恰当的证券。 事实上,美国许多州的立法当局都设定这类的法规,藉以管理储蓄银行与信托基金的投资行为。 这些州的立法当局大多规定,银行部门每年都必须准备一份清单,列举符合规定的「合法」证券——换言之,按照法律规定可以购买的证券。 根据我们的看法,明确的标准与最低规定,其根本观念应该延伸到整个纯粹投资(straight investment)的领域——换言之,只着重收益(income)的投资。 ...
      • 2024-05-23 Ben Thompson.Interview with Microsoft CTO Kevin Scott About Building Platforms on AI

        Refer To:《Interview with Microsoft CTO Kevin Scott About Building Platforms on AI》。 AI Platform AI 平台 Kevin Scott, welcome back to Stratechery. Kevin Scott,欢迎回到 Stratechery。 KS: Thank you for having me. KS:谢谢邀请我。 So let’s rewind 10 years or so and ...
      • 2024-05-23 Ben Thompson.Interview with Satya Nadella About Aligning Microsoft and AI

        Refer To:《Interview with Satya Nadella About Aligning Microsoft and AI》。 Microsoft’s Alignment 微软的一致性 Satya Nadella, welcome back to Stratechery. Satya Nadella,欢迎回到 Stratechery。 SN: Thank you so much, Ben. SN:非常感谢你,Ben。 Although I guess I’m in your ...
      • 2022-10-18 Ben Thompson.Microsoft Full Circle

        Refer To:《Microsoft Full Circle》。 In last week’s interview with Stratechery, Microsoft CEO Satya Nadella explained why the company was open to partnering with Meta for VR: 上周在接受 Stratechery 的采访时,Microsoft CEO Satya Nadella 解释了公司为何愿意与 Meta 在 VR ...