I.C.S.407.Capital.TECH.NVDA.Close

I.C.S.407.Capital.TECH.NVDA.Close

NVDA的经济前景和Token深度绑定,参考:《What are tokens and how to count them?》,处理Token的需求越大GPU的需求越大,但两者之间的关系是不稳定的,算力(硬件)和算法(软件)都能造成很大的影响,NVDA能不能独占算力上的优势也很难讲,竞争对手很多

除此以外,算力本身可能存在技术上的迭代,又是一层不确定,参照巴菲特对Intel的评论,巴菲特不看好的原因是《只有偏执狂才能生存》,书中描述了战略拐点,但他说:“安迪·格罗夫和其他一些人一起完成了转型。不过,这种事情并不是每次都会发生。”。需要偏执狂才能经营的企业或者行业不是好的选择。

最后一个负面因素来自商业模式对心理状态的影响,长期从事To B也会影响人的判断能力(很多场合下不能正确表达自己的想法)。

1、I.C.S(Intelligent hypotheses, Correct facts and Sound reasoning.)

人工智能对广告业务的影响:通过推荐引擎提升广告效率已经有明显的效果,很多企业的提升幅度接近20%,比如,Meta和腾讯,Meta每年广告收入1300亿美元,假设整个市场是5000亿美元,提升10%是500亿的新增收入,20%是1000亿的新增收入,这些收入大部分给到NVIDIA,目前看着还没有太偏离事实。

全球广告总收入9270亿美元,其中数字广告6550亿美元,主要由推荐引擎产生的广告包括社交软件(2120亿美元),短视频(780亿美元),以及大型零售商的广告业务(1260亿美元),合计4160亿美元,大型零售商的广告业务部分搜索引擎部分推荐引擎,乐观估计距离5000亿美元不算太偏离,参考:《MAGNA Ups Advertising Growth Outlook Following Strong First Half of the Year》

人工智能对云计算的影响:全球三大云计算公司,亚马逊、微软、Google,云计算的年收入分别是1100亿美元,964亿美元,456亿美元,合计2520亿美元,三大公司占据整个市场的64%,乐观估计市场目前的整体规模是5000亿美元,人工智能的增量收入有多大?

亚马逊、微软、Google在云计算上的营收入是104亿美元,105亿美元,19亿美元,合计228亿美元,全行业500亿美元,按20%的贡献计算是100亿美元,云计算和数字广告不一样,数字广告的边际成本几乎为零,增加的销售收入即贡献的价值,云计算是有固定成本的,参考:《微软、AWS和谷歌云2024年第三季度盈利对比》

2、《1983-06-15 Steve Jobs.Let the world innovate》

Making machines intuitive 让机器变得直观

STEVE JOBS: Well, the major thing is that we’ve got to make things intuitively obvious. And it turns out that people know how to do a lot of things already. In other words, if you walk into a typical office, there’s all these…there’s stacks of paper on the desk, and the one on the top is the most important. And people know how to switch priority. Right? And people know how to deal with concurrent things going on at once. They’re constantly switching between things every few minutes. And they know how to deal with interruptions. The phone rings, they get an urgent message.
史蒂夫·乔布斯:嗯,主要的是我们必须让事情变得直观明了。事实证明,人们已经知道如何做很多事情。换句话说,如果你走进一个典型的办公室,桌子上有一堆堆的文件,最上面的那份是最重要的。人们知道如何切换优先级,对吧?人们知道如何处理同时发生的事情。他们每隔几分钟就会在不同的事情之间切换。他们知道如何处理中断。电话响了,他们收到一条紧急信息。

And so what we’ve got to do is leverage off of what people already know how to do. And part of the reason we model our computers on these metaphors that already exist out there, like the desktop, is because we can leverage all this experience that people already have, and they intuitively just take to it like water.
所以我们要做的是利用人们已经知道如何做的事情。我们将计算机建模为已经存在的比喻(如桌面)的部分原因是,我们可以利用人们已经拥有的所有经验,他们会直观地接受它。

The second thing we do is, right now, when you buy an application, each one works differently. In other words, not only do you have the specific knowledge about the application to learn, but it interacts with you through the computer differently than the last one. The word processor, you move the cursor around this way. VisiCalc, you move it around another way.
我们现在做的第二件事是,当你购买一个应用程序时,每个应用程序的工作方式都不同。换句话说,你不仅需要学习关于该应用程序的特定知识,而且它通过计算机与您的交互方式也与上一个不同。文字处理器,你这样移动光标。VisiCalc,你用另一种方式移动它。

And what we’ve got to do is make it so that when you learn how to use one application, all the rest of them work in pretty much the same way. And to come up with a general…we spent a lot of time coming up with a general mechanism that was so powerful that there was not one type of program where it wouldn’t be perfect for it. We think we did that. We think we absolutely did that. And so in trying to make some consistency throughout the system, we can leverage the learning.
我们要做的是,当你学会如何使用一个应用程序时,其他所有应用程序的工作方式基本相同。我们花了很多时间想出一个通用机制,这个机制非常强大,没有一种程序类型不适合它。我们认为我们做到了。我们认为我们绝对做到了。因此,在尝试使整个系统保持一致性时,我们可以利用这种学习。

Let the world innovate 让世界创新

STEVE JOBS: The neatest thing about it is that, again, when you make tools and then…see, you get a bunch of smart people that can design into something. And then they can give it to a bunch of people that maybe can’t design it, but they can build a lot of them. And then you can give it to a bunch of people that can’t do that, but they can help get them out in the world through stores. And then get it to even a giant group of people that can use them.
史蒂夫·乔布斯:这最妙的地方是,当你创造工具,然后…你会得到一群聪明的人,他们能够设计出一些东西。然后他们可以把它交给一群可能无法设计它的人,但他们可以生产大量的这些东西。接着你可以把它交给一些无法做到这些的人,但他们可以通过商店帮助把这些东西推向市场。然后让它进入一个庞大的群体,让他们使用这些东西。

And when you have a million people using something, then that’s when creativity really starts to happen on a very rapid scale, because the marketplace…there are literally like 500 companies making software for the Apple II, and they’re all watching each other. And the minute one of them comes up with a good idea, it ain’t six months before they all got it. And so it’s constantly raising the level of competence and the level of innovation that’s required to sell stuff to those million Apple owners out there. And that’s phenomenal.
当有一百万人在使用某个东西时,这时创造力就开始以极快的速度发生,因为市场上……有大约 500 家公司在为 Apple II 制作软件,他们都在互相观察。当其中一个公司想出一个好主意时,不到六个月他们全都有了。因此,它不断提高了能力水平和创新水平,以便向那一百万苹果用户销售产品。这是惊人的。
Idea
截至2024年8月,ChatGPT的每周活跃用户已超过2亿人,较去年11月的1亿人翻了一番。
And so the fastest way to get innovation is, we need some revolutions like Lisa, but we also then need to get millions of units out there and let the world innovate, because the world is pretty good at innovating, we found. 
因此,获得创新的最快方式是,我们需要一些像 Lisa 这样的革命,但我们也需要将数百万台设备推向市场,让世界进行创新,因为我们发现世界在创新方面相当出色。

3、《1997-05-05 Berkshire Hathaway Annual Meeting》

17. We don’t know how to value Intel and Microsoft
我们不知道如何评估英特尔和微软

WARREN BUFFETT: Zone 11, please.
沃伦·巴菲特: 请第十一区提问。

AUDIENCE MEMBER: Yes, Mr. Buffett, I would like to thank you again for issuing the Class B shares.
观众: 是的,巴菲特先生,我再次感谢您发行了B类股票。

WARREN BUFFETT: (Laughs) Well, I’m glad we did, and I hope you own them.
沃伦·巴菲特: (笑)我很高兴我们这么做了,我希望您拥有这些股票。

AUDIENCE MEMBER: I am a Class B shareholder.
观众: 我是一名B类股股东。

I need your comment on some analysis that we did. If someone uses your investment philosophy of building a highly concentrated portfolio of six to eight stocks, and adopts your buy-and-holding principle so that the max of compounding and no tax works for you, but however, with one major modification: invest in high-octane companies like Intel and Microsoft that are growing at 30 percent, instead of typical 15 percent growth company in your portfolio.
我们进行了一些分析。如果有人采用您的投资哲学,建立一个高度集中的投资组合,包含6到8只股票,并采用您的长期持有原则,这样可以最大化复利效应且免于缴税,但有一个主要修改:投资于像英特尔和微软这样的高增长公司,它们的增长率是30%,而不是您投资组合中典型的15%的增长公司。

My question is, will this investment philosophy translate into twice the shareholder return as you have historically provided to your shareholders?
我的问题是,这种投资哲学是否会转化为您历史上为股东提供回报的两倍?

WARREN BUFFETT: Yeah. Well, it will certainly work out to twice the return if Intel and Microsoft do twice as well as Coke and Gillette. I mean, it’s a question of being able to identify businesses that you understand and feel very certain about.
沃伦·巴菲特: 是的。如果英特尔和微软的表现是可口可乐和吉列的两倍,那么回报肯定也会是两倍。这主要取决于是否能够识别出您理解并非常确信的业务。

And if you understand those businesses, and many people do, but Charlie and I don’t, you have the opportunity to evaluate them. And if you decide they’re fairly priced and they have marvelous prospects, you’re going to do very well.
如果您理解这些业务,很多人确实理解,但查理和我不理解,那么您就有机会去评估它们。如果您认为它们的价格合理且前景光明,那么您会表现得非常出色。

But there’s a whole group of companies, a very large group of companies, that Charlie and I just don’t know how to value. And that doesn’t bother us. I mean, you know, we don’t know what — we don’t know how to figure out what cocoa beans are going to do, or the Russian ruble, or I mean, there’s all kinds of financial instruments that we just don’t feel we have the knowledge to evaluate.
但是,有一大类公司,确实是很大的一类公司,查理和我不知道该如何评估。而这并不会困扰我们。我的意思是,比如我们不知道可可豆的价格走势,也不知道俄罗斯卢布会如何表现,还有各种各样的金融工具,我们觉得自己没有足够的知识去评估它们。

And really, you know, it might be a little too much to expect that somebody would understand every business in the world.
确实,期望有人能够理解全世界的每一个行业可能有点过分。

And we find some that are much harder for us to understand. And when I say understand, my idea of understanding a business is that you’ve got a pretty good idea where it’s going to be in ten years. And I just can’t get that conviction with a lot of businesses, whereas I can get it with relatively few. But I only need a few. As you’ve pointed out, you only need a few, six or eight or something like that.
我们发现有些行业对我们来说非常难以理解。而我所说的“理解”是指你对一个企业十年后的状况有一个相当清晰的预判。而对于很多企业,我无法得到这种信心,而对于少数企业,我可以。但我只需要少数几个企业。正如你提到的,你只需要少数几个,比如六到八个这样的企业。

It would be better for you — it certainly would have been better for you — if we had the insights about what we regard as the somewhat more complicated businesses you describe, because there was and may still be a chance to make a whole lot more money if those growth rates that you describe are maintained.
如果我们对你所描述的那些稍微复杂一些的企业有更多的洞察力,这对你来说会更好——肯定会更好。因为如果这些增长率能够保持,那么确实有机会赚到更多的钱,现在可能仍然有这样的机会。

But I don’t think they’re — I don’t think you’ll find better managers than Andy Grove at Intel and Bill Gates at Microsoft. And they certainly seem to have fantastic positions in the businesses they’re in.
但我不认为——我不认为你能找到比英特尔的安迪·格罗夫(Andy Grove)和微软的比尔·盖茨(Bill Gates)更优秀的管理者。而且他们似乎确实在各自的领域中占据了极好的位置。

But I don’t know enough about those businesses to be as sure that those positions are fantastic as I am about being sure that Gillette and Coca-Cola’s businesses are fantastic.
但对于这些企业,我并没有足够的了解,无法像确信吉列和可口可乐的业务一样,确信它们的地位是如此的出色。

You may understand those businesses better than you understand Coke and Gillette because of your background or just the way your mind is wired. But I don’t, and therefore I have to stick with what I really think I can understand. And if there’s more money to be made elsewhere, I think the people that make it are entitled to it.
由于你的背景或思维方式,你可能比了解可口可乐和吉列更了解那些企业。但我不是,因此我必须坚持投资我真正认为可以理解的业务。如果其他地方可以赚更多的钱,我认为那些赚到钱的人是应得的。

Charlie?
查理?

CHARLIE MUNGER: Well, if you take a business like Intel, there are limitations under the laws of physics which eventually stop your putting more transistors on a single chip. And the 30 percent per annum, or something like that, you — I don’t think — those limitations are still a good distance away, but they’re not any infinite distance away.
查理·芒格: 以英特尔这样的公司为例,物理定律设定了一些限制,这些限制最终会阻止你在单个芯片上放置更多的晶体管。至于30%的年增长率之类的情况,我认为——这些限制还比较遥远,但并不是无限遥远的。

That means that Intel has to leverage its current leadership into new activities, just as IBM leveraged the Hollerith machine into the computer. Predicting whether somebody’s going to be able to do that in advance is just — it’s too tough for us.
这意味着英特尔必须将其当前的领先地位扩展到新的活动中,就像IBM将霍勒瑞斯打孔机(Hollerith machine)扩展到计算机领域一样。提前预测某人是否能够做到这一点——对我们来说,这太难了。

WARREN BUFFETT: Bob Noyce —
沃伦·巴菲特: 鲍勃·诺伊斯——

CHARLIE MUNGER: We could (inaudible) to you.
查理·芒格: 我们可以(听不清)。

WARREN BUFFETT: Bob Noyce, one of the two founders of — two primary founders — of Intel, grew up in Grinnell, Iowa. I think he’s the son of a minister in Grinnell, and went through Grinnell College and was chairman of the board of trustees of Grinnell when I went on the board of Grinnell back in the late ’60s.
沃伦·巴菲特: 鲍勃·诺伊斯是英特尔的两位主要创始人之一,他在艾奥瓦州的格林内尔长大。我记得他是格林内尔一位牧师的儿子,就读于格林内尔学院,并在我于60年代末加入格林内尔董事会时,担任学院董事会主席。

And when he left Fairchild to form Intel with Gordon Moore, Grinnell bought 10 percent of the private placement that funded — was the initial funding for Intel.
当他离开仙童公司(Fairchild)与戈登·摩尔(Gordon Moore)共同创立英特尔时,格林内尔购买了英特尔初始私募资金的10%。

And Bob was a terrific guy. He was very easy to talk to, just as Bill Gates is. I mean, these fellows explained the businesses to me, and they’re great teachers but I’m a lousy student. And they — I mean, they really do. They’re very good at explaining their businesses.
鲍勃是一个了不起的人。他非常平易近人,就像比尔·盖茨一样。我是说,这些人向我解释他们的业务,他们是很棒的老师,但我是个糟糕的学生。他们确实是非常擅长解释他们的业务。

Bob was a very down to earth Iowa boy who could tell you the risks and tell you the upside, and enormously likeable, a hundred percent honest, every way.
鲍勃是一个非常脚踏实地的艾奥瓦州男孩,他会告诉你风险和潜力。他非常讨人喜欢,完全诚实,方方面面都是如此。

So we did buy 10 percent of the original issue. The genius that ran the investment committee and managed to sell those a few years later, I won’t give you his name. (Laughter)
因此,我们确实购买了最初发行的10%。至于那个领导投资委员会并在几年后卖掉这些股份的天才,我就不说他的名字了。(笑声)

And there’s no prize for anybody that calculates the value of those shares now.
顺便说一下,现在谁要是计算这些股份的价值,是没有奖品的。

Incidentally, one of the things Bob was very keen on originally, in fact he was probably the keenest on it, was he had some watch that Intel was making. And it was a fabulous watch, according to Bob.
顺便说一下,鲍勃最初非常热衷的一件事,实际上他可能最为热衷的,是英特尔当时制作的一款手表。据鲍勃说,这是一款了不起的手表。

It just had one problem. We sent a guy out from Grinnell who was going out to the West Coast to where Intel was. And Bob gave him one of these watches. And when he got back to Grinnell he wrote up a report about this little investment we had, and he said, “These watches are marvelous.” He said, “Without touching anything, they managed to adapt to the time zones as they change as we went along.” In other words, they were running very fast, as it turned out. (Laughter)
不过它有一个问题。我们从格林内尔派了一个人到西海岸英特尔的所在地。鲍勃给了他一块这样的手表。当他回到格林内尔时,他写了一份关于我们这笔小投资的报告。他说:“这些手表太棒了。”他说:“不用动任何东西,它们就能自动适应我们经过的时区变化。”换句话说,事实证明,它们走得非常快。(笑声)

And they worked with that watch for about five or six years, and they fell on their face.
他们花了五到六年时间研究那款手表,结果以失败告终。

And as you know, you know, they had a total transformation in the mid-’80s when the product on which they relied also ran out of gas. So, it’s not —
大家知道,在80年代中期,他们依赖的产品也失去了动力,于是他们进行了彻底的转型。所以,这并不是——

And Andy Grove has written a terrific book, incidentally, Only the Paranoid Survive, which describes strategic inflection points. I recommend that every one of you read that book, because it is a terrific book.
顺便说一下,安迪·格罗夫写了一本非常棒的书——《只有偏执狂才能生存》,书中描述了战略拐点。我建议你们每个人都去读这本书,因为它确实很好。

But they had an Andy Grove there who made that transformation, along with some other people. But that doesn’t happen every time. Companies get left behind.
但他们有安迪·格罗夫和其他一些人一起完成了转型。不过,这种事情并不是每次都会发生。公司会被落下。
Idea
恰恰说明这是一项非常艰难的生意,微软有同样的经历。
We don’t want to be in businesses where companies — where we feel companies can be left behind. And that means that, you know — and Intel could have, and almost did, go off the tracks. IBM owned a big piece of Intel, as you know, and they sold it in the mid-’80s.
我们不希望进入那些我们认为可能会被落后的行业。这意味着,英特尔曾经——几乎确实——脱轨了。你们知道,IBM曾经拥有英特尔的大量股份,但他们在80年代中期卖掉了。

So, you know, here are a bunch of people that should know a lot about that business but they couldn’t see the future either.
所以,你看,这些人应该对那个行业非常了解,但他们也看不到未来。

I think it’s very tough to make money that way, but I think some people can make a lot of money understanding those kinds of businesses. I mean, there are people with the insights.
我认为靠那种方式赚钱非常困难,但我也认为有些人可以通过理解那类业务赚很多钱。我的意思是,有些人确实有这种洞察力。

Walter Scott, one of our directors, has done terrifically with a business that started, you know, just a gleam in the eye maybe ten or 12 years ago here in Omaha, and it turned into a huge business.
我们的董事之一沃尔特·斯科特,在一家业务上取得了巨大的成功。这家业务大约在十到十二年前从奥马哈萌芽,后来发展成了一个巨大的企业。

And you know, Walter explained that to me on the way down to football games, but bad student again, so — (Laughs)
沃尔特曾在我们去看橄榄球比赛的路上向我解释过这个业务,但我还是一个差学生,所以——(笑声)

Walter — if Walter could have connected, and you know, I’d cheer from the stands. But that doesn’t bother me at all. I mean, what would bother me is if I think I understand a business and I don’t. That would bother me.
如果沃尔特能理解,我会在看台上为他欢呼。但这对我来说根本没有困扰。真正会困扰我的是,如果我认为我理解一个业务,但实际上并没有,那才会困扰我。

Charlie?
查理?

CHARLIE MUNGER: Well, having flunked when we were young and strong at understanding some complex businesses, we’re not looking to master what we earlier failed at — (laughs) — in our latter years. (Laughter)
查理·芒格: 我们年轻力壮的时候,就已经在理解一些复杂业务上失败了,现在年纪大了,我们可不打算重新掌握那些我们之前没搞懂的东西。(笑声)

WARREN BUFFETT: Zone 12? This may turn out like a revival meeting where we all confess our sins and come forward (inaudible). (Laughter)
沃伦·巴菲特: 第十二区?这可能会像一个忏悔大会,大家都来承认自己的过错并站出来(听不清)。(笑声)

4、《2023-02-09 Ted Chiang.ChatGPT Is a Blurry JPEG of the Web》

2013年,一家德国建筑公司的工人注意到他们的施乐复印机有些不对劲:当他们复制一个房屋的平面图时,复印件与原件存在微妙但重要的差异。在原始平面图中,每个房间都伴随着指定其面积的矩形:三个房间的面积分别为14.13、21.11和17.42平方米。然而,在照片复制品中,所有三个房间都被标记为大小为14.13平方米。该公司联系了计算机科学家David Kriesel调查这个看起来难以置信的结果。他们需要一位计算机科学家,因为现代施乐复印机不使用在1960年代流行的物理静电复印过程。相反,它数字扫描文档,然后打印生成的图像文件。再加上几乎每个数字图像文件都被压缩以节省空间,这个谜团的解决方案就开始浮现。

压缩文件需要两个步骤:首先是编码,其中文件被转换为更紧凑的格式,然后是解码,即将过程反转。如果恢复的文件与原文件相同,则压缩过程被描述为无损:没有丢失任何信息。相比之下,如果恢复的文件只是原始文件的近似值,则压缩被描述为有损:一些信息已经丢失,现在无法恢复。无损压缩通常用于文本文件和计算机程序,因为这些领域中,即使一个字符不正确都可能具有灾难性。有损压缩通常用于照片、音频和视频,在这些情况下,绝对准确性并不是必要的。大多数情况下,即使图像、歌曲或电影不是完美重现,我们也不会注意到。只有在文件被压缩得非常紧凑时,我们才会注意到所谓的压缩伪像:最小jpeg和mpeg图像的模糊程度,或低比特率MP3的尖锐声音。

施乐复印机使用一种名为jbig2的有损压缩格式,专为黑白图像设计。为了节省空间,复印机识别图像中外观相似的区域,并为所有区域存储一个副本;当文件被解压缩时,它重复使用该副本来重构图像。结果发现,复印机认为标记房间面积的标签足够相似,只需要存储一个标签“14.13”,并在打印平面图时将其重复使用于三个房间。

施乐复印机使用有损压缩格式而不是无损压缩格式本身并不是问题。问题在于复印机以微妙的方式降低了图像质量,压缩伪像并不是立即可识别的。如果复印机只是产生模糊的打印输出,每个人都会知道它们不是原件的准确复制品。导致问题的是,复印机产生的数字是可读但不正确的;复制品似乎是准确的,但实际上并不是。(2014年,施乐发布了一个修补程序来解决这个问题。)

我认为这个关于施乐复印机的事件值得我们今天牢记,因为我们考虑OpenAI的ChatGPT和其他类似的程序,这些程序被人工智能研究人员称为大型语言模型。复印机和大型语言模型之间的相似之处可能并不立即显现,但请考虑以下情况。想象一下,您即将永久失去对互联网的访问权限。为此,您计划创建整个网络上所有文本的压缩副本,以便将其存储在私有服务器上。不幸的是,您的私有服务器只有所需空间的百分之一;如果想让所有内容都适合,您就不能使用无损压缩算法。相反,您编写了一种有损算法,该算法识别文本中的统计规律并将它们存储在一种专门的文件格式中。由于您可以投入几乎无限的计算能力来完成此任务,因此您的算法可以识别非常微妙的统计规律,从而实现所需的100:1压缩比率。

现在,失去互联网访问权限并不太可怕;您在服务器上存储了网络上的所有信息。唯一的问题是,由于文本已经被高度压缩,您无法通过搜索确切的引用来查找信息;您永远不会得到完全匹配,因为存储的不是单词。为了解决这个问题,您创建了一个接口,以问题的形式接受查询,并用传达您服务器上内容要旨的答案进行响应。

我描述的情况听起来很像ChatGPT,或者说是大多数其他大型语言模型。将ChatGPT视为网络上所有文本的模糊JPEG。它保留了网络上的大部分信息,就像jpeg保留了更高分辨率图像的大部分信息一样,但是,如果您正在寻找确切的比特序列,您将找不到它;您将得到的一直是近似值。但是,由于近似值以语法文本的形式呈现,而ChatGPT擅长创建这种文本,因此通常是可以接受的。您仍然在查看模糊的jpeg,但是模糊之处以一种不会使整个图像显得不太清晰的方式发生。

这种与有损压缩的类比不仅是了解ChatGPT在使用不同的单词重新包装网络上的信息方面的能力的一种方式。它也是理解大型语言模型(如ChatGPT)容易出现“幻觉”或对事实问题的无意义回答的一种方式。这些幻觉是压缩伪像,但是——就像施乐复印机生成的不正确标签一样——它们是足够合理的,以至于需要将它们与原始数据进行比较,这在这种情况下意味着网络或我们自己对世界的了解。当我们以这种方式考虑它们时,这些幻觉并不令人惊讶;如果压缩算法旨在在丢弃99%的原始数据后重构文本,我们应该预计它生成的重要部分将是完全捏造的。

5、《2023-10-15 Jensen Huang.ACQUIRED Interview with NVIDIA CEO Jensen Huang》

Jensen: That’s right. But we observed that there was a segment of the market. At the time, the PC industry was still coming up and it wasn’t good enough. Everybody was clamoring for the next fastest thing. If your performance was 10 times higher this year than what was available, there’s a whole large market of enthusiasts who we believe would’ve gone after it. And we were absolutely right, that the PC industry had a substantially large enthusiast market that would buy the best of everything.
Jensen: 没错。但我们观察到市场中有一个细分群体。当时,PC行业还在发展,且性能还不够好。每个人都在争相追求下一代更快的产品。如果今年你的性能比现有产品高出10倍,我们相信有一大群发烧友会争相购买。而我们完全正确,PC行业确实拥有一个庞大的发烧友市场,他们愿意购买最好的产品。

To this day, it remains true. For certain segments of a market where the technology is never good enough like 3D graphics, and we chose the right technology, 3D graphics is never good enough. We call it back then 3D gives us sustainable technology opportunity because it’s never good enough, so your technology can keep getting better. We chose that.
直到今天,这依然成立。对于某些市场的细分领域,技术永远不够好,比如3D图形,我们选择了正确的技术,3D图形永远不够好。我们当时称之为“3D为我们提供了可持续的技术机会,因为它永远不够好,因此你的技术可以不断改进”。我们选择了这一点。
Idea
聪明的假设。


Jensen: Oftentimes, if you created the market, you ended up having what people describe as moats, because if you build your product right and it’s enabled an entire ecosystem around you to help serve that end market, you’ve essentially created a platform.
Jensen: 通常,如果你创建了市场,你最终会拥有所谓的护城河,因为如果你正确地构建了你的产品,并且它使周围的整个生态系统得以支持这个最终市场,那么你就实际上创造了一个平台。

Sometimes it’s a product-based platform. Sometimes it’s a service-based platform. Sometimes it’s a technology-based platform. But if you were early there and you were mindful about helping the ecosystem succeed with you, you ended up having this network of networks, and all these developers and customers who are built around you. That network is essentially your moat.
有时它是基于产品的平台,有时是基于服务的平台,也有时是基于技术的平台。但如果你早早进入并且注意到与生态系统一起成功,你最终会拥有一个网络中的网络,所有围绕你的开发者和客户。这个网络本质上就是你的护城河。

I don’t love thinking about it in the context of a moat. The reason for that is because you’re now focused on building stuff around your castle. I tend to like thinking about things in the context of building a network. That network is about enabling other people to enjoy the success of the final market. That you’re not the only company that enjoys it, but you’re enjoying it with a whole bunch of other people.
我不太喜欢在护城河的框架中思考这个问题。原因是你现在专注于围绕你的城堡建造东西。我更倾向于从建立网络的角度来看待事情。这个网络的核心是使其他人也能分享最终市场的成功。你不是唯一享受它的公司,而是和很多其他人一起享受它。
Idea
大型科技企业的特征。
David: I’m so glad you brought this up because I wanted to ask you. In my mind, at least, and it sounds like in yours, too, Nvidia is absolutely a platform company of which there are very few meaningful platform companies in the world.
David: 我很高兴你提到了这一点,因为我一直想问你。至少在我看来,听起来在你看来也是,Nvidia绝对是一家平台公司,而世界上有意义的平台公司是非常少的。

I think it’s also fair to say that when you started, for the first few years you were a technology company and not a platform company. Every example I can think of, of a company that tried to start as a platform company, fails. You got to start as a technology first.
我也认为可以公平地说,当你们刚开始时,最初几年你们是科技公司,而不是平台公司。我能想到的所有试图作为平台公司起步的公司,都失败了。你必须首先作为技术公司开始。

When did you think about making that transition to being a platform? Your first graphics cards were technology. There was no CUDA, there was no platform.
你什么时候开始考虑从技术公司转变为平台公司?你们的第一代显卡是技术产品,并没有CUDA,也没有平台。

Jensen: What you observed is not wrong. However, inside our company, we were always a platform company. The reason for that is because from the very first day of our company, we had this architecture called UDA. It’s the UDA of CUDA.
Jensen: 你观察到的并没有错。然而,在我们公司内部,我们一直都是平台公司。原因是从公司成立的第一天起,我们就有一个架构叫做UDA,它就是CUDA的UDA。

6、《2024-09-11 Jensen Huang.Goldman Sachs Communacopia + Technology Conference》

Dig down on this a little bit deeper, just talk about the differences between general purpose and accelerating computing.
深入探讨一下这个问题,谈谈通用计算和加速计算之间的区别。

Jensen Huang 黄仁勋

If you look at software, out of your body of software that you wrote, there's a lot of file IO, there's a -- setting up the data structure, there's a part of the software inside, which has some of the magic kernels, the magic algorithms. And these algorithms are different depending on whether it's computer graphics or image processing or whatever it happens to be. It could be fluids, it could be particles, it could be inverse physics as I mentioned, it could be image domain type stuff. And so all these different algorithms are different. And if you created a processor that is somehow really, really good at those algorithms and you complement the CPU where the CPU does whatever it's good at, then theoretically, you could take an application and speed it up tremendously. And the reason for that is because usually some 5%, 10% of the code represents 99.999% of the runtime. And so if you take that 5% of the code and you offloaded it on our accelerator, then technically, you should be able to speed up the application 100 times. And it's not abnormal that we do that. It's not unusual. And so we'll speed up image processing by 500 times. And now we do data processing. Data processing is one of my favorite applications because almost everything related to machine learning, which is a data-driven way of doing software, data processing has evolved. It could be SQL data processing, it could be Spark type of data processing, it could be a vector database type of processing, all kinds of different ways of processing either unstructured data or structured data, which is data frames and we accelerate the living daylights out of that. But in order to do that, you have to create that library, that fancy library on top. And in the case of computer graphics, we were fortunate to have Silicon Graphics' OpenGL and Microsoft DirectX. But outside of those, no libraries really existed. And so for example, one of our most famous libraries is a library kind of like SQL is a library. SQL is a library for in-storage computing. We created a library called cuDNN. cuDNN is the world's first neural network computing library. And so we have cuDNN, we have cuOpt for combinatory optimization, we have cuQuantum for quantum simulation and emulation, all kinds of different libraries, cuDF for data frame processing, for example, SQL. And so all these different libraries have to be invented that takes the algorithms that run in the application and refactor those algorithms in a way that our accelerators can run. And if you use those libraries, then you get 100x speed up.
如果你看软件,在你编写的软件中,有很多文件IO操作,还有数据结构的设置,软件中也有一些“魔法”内核,神奇的算法。这些算法根据不同的应用是不同的,比如计算机图形处理、图像处理,或者其他应用领域。可能是流体模拟,可能是粒子模拟,可能是逆物理学,就像我之前提到的,或者是图像领域的相关处理。因此,这些不同的算法各不相同。
如果你创造了一种处理器,这种处理器在这些算法上表现得非常好,同时与CPU互补,CPU可以做它擅长的事情,那么理论上,你可以大幅加速应用程序。原因在于,通常5%到10%的代码占据了99.999%的运行时间。因此,如果你能将那5%的代码卸载到我们的加速器上,那么从技术上讲,你应该能够将应用程序的速度提高100倍。这并不罕见,也不例外。我们可以将图像处理的速度提高500倍。现在我们还做数据处理。数据处理是我最喜欢的应用之一,因为几乎所有与机器学习相关的内容——即一种数据驱动的软件开发方式——都涉及到数据处理的演变。它可能是SQL数据处理,可能是Spark类型的数据处理,可能是矢量数据库类型的处理,无论是处理非结构化数据还是结构化数据(如数据框架),我们都可以大幅加速这一过程。
为了做到这一点,你必须在顶层创建那个高级的库。在计算机图形领域,我们很幸运有Silicon Graphics的OpenGL和微软的DirectX。但在这些之外,几乎没有任何库存在。例如,我们最著名的一个库之一类似于SQL。SQL是一个用于存储计算的库,而我们创建了一个叫cuDNN的库。cuDNN是世界上第一个神经网络计算库。我们还有cuDNN、用于组合优化的cuOpt、用于量子模拟和仿真的cuQuantum、以及用于数据框处理的cuDF,例如SQL。
因此,所有这些不同的库都必须被发明出来,以重新构造应用程序中的算法,使我们的加速器能够运行这些算法。如果你使用这些库,那么你就能获得100倍的加速。

7、《2024-11-20 NVIDIA Corporation (NVDA) Q3 2025 Earnings Call Transcript》

And every time you read a PDF, open a PDF, it generated a whole bunch of tokens. One of my favorite applications is NotebookLM, this Google application that came out. I use the living daylights out of it just because it's fun. And I put every PDF, every archive paper into it just to listen to it as well as scanning through it. And so I think -- that's the goal is to train these models so that people use it. And there's now a whole new era of AI if you will, a whole new genre of AI called physical AI, just those large language models understand the human language and how we the thinking process, if you will. Physical AI understands the physical world and it understands the meaning of the structure and understands what's sensible and what's not and what could happen and what won't and not only does it understand but it can predict and roll out a short future. That capability is incredibly valuable for industrial AI and robotics.
每次你阅读一个 PDF,打开一个 PDF,它都会生成一大堆的标记。我最喜欢的应用之一是 NotebookLM,这是谷歌推出的一个应用。我非常频繁地使用它,因为它很有趣。我把每个 PDF,每篇存档论文都放进去,不仅是为了听它,还为了浏览它。因此,我认为——目标是训练这些模型以便人们使用它。现在有一个全新的 AI 时代,如果你愿意的话,一个全新的 AI 类型,叫做物理 AI,就是那些大型语言模型理解人类语言和我们的思维过程,如果你愿意的话。物理 AI 理解物理世界,它理解结构的意义,理解什么是合理的,什么不是,什么可能发生,什么不会发生,它不仅理解,还能预测并推出一个短期的未来。这种能力对于工业 AI 和机器人技术来说是非常有价值的。

8、《2024-12-03 Amazon Announces Supercomputer, New Server Powered by Homegrown AI Chips》

Company leaders, though, are realistic about how far AWS’s chip ambitions can go—at least at the moment.
不过,公司领导对 AWS 芯片雄心的实现程度持现实态度——至少目前如此。

“I actually think most will probably be Nvidia for a long time, because they’re 99% of the workloads today, and so that’s probably not going to change,” AWS CEO Garman said. “But, hopefully, Trainium can carve out a good niche where I actually think it’s going to be a great option for many workloads—not all workloads.”
“我实际上认为大多数情况下可能会长期使用英伟达,因为他们占据了今天 99%的工作负载,所以这可能不会改变,”AWS 首席执行官 Garman 说。“但希望 Trainium 能够开辟一个好的市场,我实际上认为它将成为许多工作负载的一个很好的选择——不是所有的工作负载。”

9、《2025-01-03 Intel’s Problems Are Even Worse Than You’ve Heard》

You may think you know how much Intel is struggling, but the reality is worse.
你可能认为你知道 Intel 有多么挣扎,但现实更糟。

The once-mighty American innovation powerhouse is losing market share in multiple areas that are critical to its profitability. Its many competitors include not just the AI juggernaut Nvidia but smaller rivals and even previously stalwart allies like Microsoft.
这家曾经强大的美国创新巨头正在多个对其盈利能力至关重要的领域失去市场份额。它的众多竞争对手不仅包括 AI 巨头 Nvidia,还有较小的竞争对手,甚至是像 Microsoft 这样曾经坚定的盟友。

One flashing warning sign: In the latest quarter reported by both companies, Intel’s perennial also-ran, AMD, actually eclipsed Intel’s revenue for chips that go into data centers. This is a stunning reversal: In 2022, Intel’s data-center revenue was three times that of AMD.
一个显而易见的警告信号:在两家公司最近报告的季度中,Intel 长期以来的追随者 AMD 实际上在数据中心芯片的收入上超过了 Intel。这是一个惊人的逆转:在 2022 年,Intel 的数据中心收入是 AMD 的三倍。

AMD and others are making huge inroads into Intel’s bread-and-butter business of making the world’s most cutting-edge and powerful general-purpose chips, known as CPUs, short for central processing units.
AMD 和其他公司正在大举进入 Intel 的核心业务,即制造世界上最尖端和强大的通用芯片,称为 CPU,即中央处理器。

Even worse, more and more of the chips that go into data centers are GPUs, short for graphics processing units, and Intel has minuscule market share of these high-end chips. GPUs are used for training and delivering AI.
更糟糕的是,越来越多用于数据中心的芯片是 GPU,即图形处理单元,而 Intel 在这些高端芯片中所占的市场份额微乎其微。GPU 用于训练和交付 AI。

By focusing on the all-important metric of performance per unit of energy pumped into their chips, AMD went from almost no market share in servers to its current ascendant position, says AMD Chief Technology Officer Mark Papermaster. As data centers become ever more rapacious for energy, this emphasis on efficiency has become a key advantage for AMD.
通过专注于每单位能耗性能这一重要指标,AMD 从几乎没有服务器市场份额发展到目前的上升地位,AMD 首席技术官 Mark Papermaster 表示。随着数据中心对能源的需求越来越大,这种对效率的重视已成为 AMD 的关键优势。

Notably, Intel still has about 75% of the market for CPUs that go into data centers. The disconnect between that figure and the company’s share of revenue from selling a wider array of chips for data centers only serves to illustrate the core problem driving its reversal of fortunes.
值得注意的是,Intel 仍然占据了大约 75%的数据中心 CPU 市场份额。这个数字与公司在销售更广泛的数据中心芯片阵列方面的收入份额之间的差距,只是说明了导致其命运逆转的核心问题。

This situation looks likely to get worse, and quickly. Many of the companies spending the most on building out new data centers are switching to chips that have nothing to do with Intel’s proprietary architecture, known as x86, and are instead using a combination of a competing architecture from ARM and their own custom chip designs.
这种情况看起来可能会迅速恶化。许多在建设新数据中心上花费最多的公司正在转向与 Intel 的专有架构 x86 无关的芯片,而是使用来自 ARM 的竞争架构和他们自己的定制芯片设计的组合。

A spokeswoman for Intel says the company is focused on simplifying and strengthening its product portfolio, and advancing its manufacturing and foundry capabilities while optimizing costs. Intel interim Co-Chief Executive Michelle Johnston Holthaus recently said that 2025 will be a “year of stabilization” for the company. Intel is currently seeking a permanent leader after its CEO Pat Gelsinger was pushed out last month.
英特尔的一位女发言人表示,公司专注于简化和加强其产品组合,并在优化成本的同时提升其制造和代工能力。英特尔临时联席首席执行官 Michelle Johnston Holthaus 最近表示,2025 年将是公司“稳定的一年”。在其首席执行官 Pat Gelsinger 上个月被迫离职后,英特尔目前正在寻找一位永久领导者。

The decades that developers spent writing software for Intel’s chips mean that Intel remains a giant, even as its market share has shrunk, and that legacy will limit how quickly Intel’s revenues can decline in the future. Analysts estimate Intel’s 2024 revenue was about $55 billion, just behind Nvidia’s approximately $60 billion. Intel still has the lion’s share of the market for desktop and notebook CPUs—around 76%, overall, according to Mercury Research.
开发人员为 Intel 芯片编写软件的几十年意味着,即使其市场份额缩小,Intel 仍然是一个巨头,这种遗产将限制 Intel 未来收入下降的速度。分析师估计,Intel 2024 年的收入约为 550 亿美元,仅次于 Nvidia 的约 600 亿美元。根据 Mercury Research 的数据,Intel 在台式机和笔记本电脑 CPU 市场中仍占据约 76%的份额。

AMD recently formed an alliance with Intel to collaborate on support and development of the x86 ecosystem that both companies make chips for. Papermaster says that his own company continues to invest in this ecosystem even as AMD also develops ARM-based chips for some applications, such as networking and embedded devices.
AMD 最近与 Intel 结成联盟,以合作支持和开发两家公司都为之制造芯片的 x86 生态系统。Papermaster 表示,尽管 AMD 也在为某些应用(如网络和嵌入式设备)开发基于 ARM 的芯片,但他自己的公司仍在继续投资于这一生态系统。

For a concrete example of Intel’s challenges, look at Amazon, the world’s biggest provider of cloud computing. More than half of the CPUs Amazon has installed in its data centers over the past two years were its own custom chips based on ARM’s architecture, Dave Brown, Amazon vice president of compute and networking services, said recently.
要了解 Intel 面临的挑战,可以看看全球最大的云计算提供商亚马逊。亚马逊副总裁 Dave Brown 最近表示,过去两年中,亚马逊在其数据中心安装的 CPU 中有一半以上是基于 ARM 架构的自定义芯片。

This displacement of Intel is being repeated all across the big providers and users of cloud computing services. Microsoft and Google have also built their own custom, ARM-based CPUs for their respective clouds. In every case, companies are moving in this direction because of the kind of customization, speed and efficiency that custom silicon allows.
这种对 Intel 的取代正在所有大型云计算服务提供商和用户中重复上演。Microsoft 和 Google 也为各自的云构建了自己的定制 ARM 架构 CPU。在每种情况下,公司都朝这个方向发展,因为定制芯片所允许的定制化、速度和效率。

All those companies are also making their own custom, ARM-based chips for AI workloads, an area where Intel has missed the boat almost entirely. Then there’s the 800-pound gorilla in AI, Nvidia. Many of Nvidia’s current-generation AI systems have Intel CPUs in them, but ARM-based chips are increasingly taking center stage in the company’s bleeding-edge hardware.
所有这些公司也在为 AI 工作负载制造自己的定制 ARM 架构芯片,而这是 Intel 几乎完全错失的领域。然后是 AI 领域的 800 磅大猩猩,Nvidia。Nvidia 的许多当前一代 AI 系统中都有 Intel 的 CPU,但 ARM 架构芯片正越来越多地在该公司的尖端硬件中占据中心位置。

Intel’s repeated flubs in entering markets for new kinds of computing and new applications for chips are a textbook example of a big, profitable incumbent becoming a victim of the innovator’s dilemma, says Doug O’Laughlin, an industry analyst at SemiAnalysis, which recently published a blistering report on Intel. The innovator’s dilemma holds that powerful companies that are unwilling to cannibalize their biggest sources of revenue can be overtaken by upstarts that build competing products that start out small, but which can ultimately take over the market which the incumbent dominates—like the mobile chips which ARM started off with.
英特尔在进入新型计算市场和芯片新应用方面的反复失误,是一个大而盈利的现有企业成为创新者困境受害者的教科书式例子,行业分析师 Doug O’Laughlin 在 SemiAnalysis 最近发布的一份严厉报告中表示。创新者困境认为,那些不愿意蚕食其最大收入来源的强大公司,可能会被那些开发竞争产品的后起之秀所超越,这些产品起初规模较小,但最终可以接管现有企业主导的市场——就像 ARM 最初推出的移动芯片一样。

In 1988, former Intel CEO Andy Grove published a book called Only the Paranoid Survive, which highlighted the ways that companies have to be vigilant about what’s coming next, and be willing to disrupt themselves and pursue new technologies. What he intended as a warning to all companies has since become a prophecy foretelling Intel’s current difficulties.
1988 年,前 Intel 首席执行官 Andy Grove 出版了一本名为《只有偏执狂才能生存》的书,强调了公司必须警惕未来的发展,并愿意自我颠覆和追求新技术。他本意是对所有公司的警告,如今却成为预言,预示了 Intel 当前的困难。
Warning
对价值投资来说是一个错误的行业。
“The book is literally about the importance of not missing strategic inflections, and then Intel proceeds to miss every single strategic inflection since,” says O’Laughlin.
“这本书实际上是关于不遗漏战略拐点的重要性,然后 Intel 却错过了自那以来的每一个战略拐点,”O’Laughlin 说。

Then there are laptops. After decades of trying to make it happen, 2024 was finally the year of credible, ARM-based laptops running Windows, thanks to efforts by Microsoft to make Windows on ARM work. The company convinced other companies to port their own software, and created tools that allow most existing programs to run on the new laptops, in emulation. Chips in these devices are made by Qualcomm, and benchmarks show that they can finally compete with Apple’s M-class mobile processors, which are also based on a combination of ARM technology and a great deal of custom chip design by Apple’s formidable in-house team.
然后是笔记本电脑。经过几十年的努力,2024 年终于成为了运行 Windows 的可信赖的 ARM 架构笔记本电脑之年,这要归功于 Microsoft 使 Windows 在 ARM 上运行的努力。该公司说服其他公司移植他们自己的软件,并创建了允许大多数现有程序在新笔记本电脑上以仿真方式运行的工具。这些设备中的芯片由 Qualcomm 制造,基准测试显示它们终于可以与 Apple 的 M 系列移动处理器竞争,这些处理器同样基于 ARM 技术和 Apple 强大的内部团队进行的大量定制芯片设计。

Another bastion of market share and profits for Intel, the PC gaming market, is also showing early signs of erosion. Portable gaming systems like Valve’s Steam Deck and the Lenovo Legion Go, which can run even very demanding games, use processors from AMD. Future devices that will be part of the company’s plan to license its custom OS to other manufacturers may also use ARM-based ones.
对于英特尔来说,另一个市场份额和利润的堡垒——PC 游戏市场,也显示出早期的侵蚀迹象。像 Valve 的 Steam Deck 和联想的 Legion Go 这样的便携式游戏系统,即使是运行非常高要求的游戏,也使用来自 AMD 的处理器。未来将成为公司计划的一部分,将其定制操作系统授权给其他制造商的设备也可能使用基于 ARM 的处理器。

Inherent in Intel’s woes is the way its vertically integrated structure, long an asset, now weighs on the company’s bottom line and ability to innovate. Unlike other companies that either design chips or manufacture them, Intel has stuck to a seemingly antiquated model of doing both.
英特尔困境的内在原因在于,其长期以来作为资产的垂直整合结构,如今却拖累了公司的利润和创新能力。与其他仅设计芯片或制造芯片的公司不同,英特尔坚持采用一种看似过时的双重模式。

Intel reported a $16 billion loss in its most recent quarter as it spent big to transform into a contract manufacturer—that is, a company that also manufactures chips for other companies, even competitors—and catch up to rival TSMC, which now produces the world’s most cutting-edge chips.
Intel 在最近一个季度报告了 160 亿美元的亏损,因为它投入巨资转型为合同制造商——即一家也为其他公司甚至竞争对手制造芯片的公司——并赶上现在生产世界上最先进芯片的竞争对手 TSMC。

Analysts expect Intel to return to profitability in 2025, but it won’t be clear for years whether the company’s big manufacturing bets will ultimately pay off.
分析师预计 Intel 将在 2025 年恢复盈利,但公司大规模制造投资是否最终会取得成功,几年内都不会明朗。

One of the big bets of Intel’s recently departed CEO Gelsinger, was Intel’s attempt to leapfrog TSMC in terms of chip technology. What it calls its “18A” tech could in theory allow its own chips, and those it makes for outsiders, to once again be the most cutting-edge, and the fastest, on the planet. The company has said it could regain that title by 2026. Intel recently announced it had signed a deal with Amazon to make custom chips for the company, using its 18A technology.
英特尔最近离任的首席执行官盖尔辛格的一个重大赌注是英特尔试图在芯片技术方面超越台积电。它所谓的“18A”技术理论上可以使其自身芯片以及为外部制造的芯片再次成为全球最前沿和最快的。公司表示,到 2026 年可以重新获得这一称号。英特尔最近宣布已与亚马逊签署协议,使用其 18A 技术为该公司制造定制芯片。

Even if Intel can once again lead the industry with its technology, the best case scenario for Intel’s own products is that it regains dominance in a market that continues to shrink—the x86 CPU one, says O’Laughlin. The removal of Gelsinger, who was betting on an all-in strategy for Intel to regain dominance both in the market for its own chips and in serving outside companies, suggests that Intel’s board agrees that the company can’t continue to count on being the best in the world at everything.
即使 Intel 能够再次凭借其技术引领行业,Intel 自身产品的最佳情况是重新在一个持续萎缩的市场中占据主导地位——即 x86 CPU 市场,O’Laughlin 说。Gelsinger 的离职表明,Intel 董事会同意公司不能继续指望在所有领域都成为世界最佳,他曾押注于 Intel 通过全力以赴的策略重新在自家芯片市场和为外部公司服务中占据主导地位。

All of these challenges and conflicting priorities may push Intel to someday split in two, severing its product side from manufacturing. Intel INTC 0.26%increase; green up pointing triangle Co-CEO David Zinsner recently said that spinning off the company’s manufacturing side is an “open question.”
所有这些挑战和相互冲突的优先事项可能会促使 Intel 有一天分拆为两部分,将其产品部门与制造部门分离。Intel 的联合首席执行官 David Zinsner 最近表示,剥离公司的制造部门是一个“开放的问题”。

It’s also possible, in the worst case, that a fate even worse than being dismembered could be in store for Intel.
在最坏的情况下,Intel 可能面临比被拆分更糟糕的命运。

Rene Haas, CEO of ARM, recently observed that Intel has long been an innovation powerhouse, but that in chipmaking and design, there are countless companies that don’t innovate fast enough—and no longer exist.
ARM 的首席执行官 Rene Haas 最近指出,Intel 长期以来一直是创新的强大力量,但在芯片制造和设计方面,有无数公司创新速度不够快,已经不复存在。

10、《2025-01-13 U.S. Targets China With New AI Curbs, Overriding Nvidia’s Objections》

The caps on exports of AI chips apply in different ways to different countries and companies.
对 AI 芯片出口的限制在不同国家和公司中适用的方式不同。

The 18 close U.S. allies will face no restrictions on purchases of chips. And smaller orders from customers around the world—up to around 1,700 advanced AI chips—won’t require a license or count against caps on countries’ chip purchases, the Commerce Department said.
18 个美国亲密盟友在购买芯片时将不受限制。美国商务部表示,来自世界各地客户的小额订单——最多约 1,700 个先进 AI 芯片——不需要许可证,也不计入各国芯片购买的上限。

That leaves the question of whether companies based in the U.S. or its allies can build significant AI capacity in a country falling into a middle zone—neither trusted ally nor top adversary. The Commerce Department said yes, but with limits. Companies that meet high security standards can apply for a status that allows them to place up to 7% of their global AI computing capacity in any single such country. That could be as many as hundreds of thousands of chips, the department said. 
这就留下了一个问题,即总部位于美国或其盟国的公司是否可以在一个处于中间地带的国家建立显著的 AI 能力——既不是可信赖的盟友,也不是主要对手。商务部表示可以,但有限制。符合高安全标准的公司可以申请一种状态,允许他们在任何一个这样的国家中放置其全球 AI 计算能力的最多 7%。商务部表示,这可能多达数十万片芯片。

A further category of companies based in countries that aren’t U.S. adversaries can apply for a status allowing them to buy up to the equivalent of 320,000 of today’s advanced AI chips over the next two years. Those that don’t get this status can still buy up to the equivalent of 50,000 advanced AI chips.
位于非美国对手国家的公司可以申请一种状态,允许他们在未来两年内购买相当于 32 万片当今先进 AI 芯片的产品。未获得此状态的公司仍然可以购买相当于 5 万片先进 AI 芯片的产品。

The limits suggest many countries could be challenged in setting up AI computing facilities capable of competing with the largest and most advanced in the U.S. and its closely allied countries. Some of the biggest AI computing facilities in the U.S. contain huge numbers of Nvidia’s AI chips, including the Colossus supercomputer being built by Elon Musk’s xAI in Memphis, Tenn., which is being scaled up to include 200,000 of them.
这些限制表明,许多国家在建立能够与美国及其紧密盟国中最大和最先进的 AI 计算设施竞争的 AI 计算设施方面可能面临挑战。美国一些最大的 AI 计算设施包含大量 Nvidia 的 AI 芯片,包括由 Elon Musk 的 xAI 在田纳西州孟菲斯建造的 Colossus 超级计算机,该计算机正在扩展以包含其中的 20 万个芯片。

11、《2025-01-25 Jeffrey Emanuel.The Short Case for Nvidia Stock》

The Major Threats  主要威胁

At a very high level, you can think of things like this: Nvidia operated in a pretty niche area for a very long time; they had very limited competition, and the competition wasn't particular profitable or growing fast enough to ever pose a real threat, since they didn't have the capital needed to really apply pressure to a market leader like Nvidia. The gaming market was large and growing, but didn't feature earth shattering margins or particularly fabulous year over year growth rates.
从一个非常高的层面来看,你可以这样理解:Nvidia 在一个相当小众的领域运营了很长时间;他们的竞争非常有限,而且竞争对手并不特别盈利,也没有足够快的增长速度来真正构成威胁,因为他们没有足够的资本来真正向 Nvidia 这样的市场领导者施加压力。游戏市场规模庞大且在增长,但并没有惊人的利润率或特别出色的年度增长率。

A few big tech companies started ramping up hiring and spending on machine learning and AI efforts around 2016-2017, but it was never a truly significant line item for any of them on an aggregate basis— more of a "moonshot" R&D expenditure. But once the big AI race started in earnest with the release of ChatGPT in 2022— only a bit over 2 years ago, although it seems like a lifetime ago in terms of developments— that situation changed very dramatically.
几家大型科技公司在 2016-2017 年左右开始加大对机器学习和 AI 的招聘和投入,但从整体来看,这从未成为它们真正重要的支出项目,更像是一种“登月”式的研发投入。然而,自从 2022 年 ChatGPT 发布后——仅仅两年多以前,尽管从技术发展的角度来看似乎已经过去了很久——这一情况发生了巨大变化。

Suddenly, big companies were ready to spend many, many billions of dollars incredibly quickly. The number of researchers showing up at the big research conferences like Neurips and ICML went up very, very dramatically. All the smart students who might have previously studied financial derivatives were instead studying Transformers, and $1mm+ compensation packages for non-executive engineering roles (i.e., for independent contributors not managing a team) became the norm at the leading AI labs.
突然,大公司准备以极快的速度花费数百亿美元。出现在 NeurIPS 和 ICML 等大型研究会议上的研究人员数量急剧增加。所有那些以前可能研究金融衍生品的聪明学生,转而研究 Transformers,而在顶级 AI 实验室,非管理团队的独立贡献工程师职位的薪酬套餐超过 100 万美元已成为常态。

It takes a while to change the direction of a massive cruise ship; and even if you move really quickly and spend billions, it takes a year or more to build greenfield data centers and order all the equipment (with ballooning lead times) and get it all set up and working. It takes a long time to hire and onboard even smart coders before they can really hit their stride and familiarize themselves with the existing codebases and infrastructure.
要改变一艘巨型游轮的方向需要一段时间;即使行动非常迅速并投入数十亿美元,建设全新的数据中心、订购所有设备(交付周期不断延长)、完成安装和调试也需要一年或更长时间。即使是聪明的程序员,在真正进入状态并熟悉现有代码库和基础设施之前,招聘和培训也需要很长时间。

But now, you can imagine that absolutely biblical amounts of capital, brainpower, and effort are being expended in this area. And Nvidia has the biggest target of any player on their back, because they are the ones who are making the lion's share of the profits TODAY, not in some hypothetical future where the AI runs our whole lives.
但现在,你可以想象到,绝对庞大的资本、智慧和努力正被投入到这个领域。而 Nvidia 是所有参与者中最受瞩目的目标,因为他们才是今天获取最大份额利润的一方,而不是在某个假设的未来,AI 主导我们的整个生活。

So the very high level takeaway is basically that "markets find a way"; they find alternative, radically innovative new approaches to building hardware that leverage completely new ideas to sidestep barriers that help prop up Nvidia's moat.
所以,高层次的要点基本上是“市场总能找到出路”;它们会找到替代的、极具创新性的全新方法来构建硬件,利用全新的理念来规避那些支撑 Nvidia 护城河的障碍。

The Hardware Level Threat 硬件层级威胁

For example, so-called "wafer scale" AI training chips from Cerebras, which dedicate an entire 300mm silicon wafer to an absolutely gargantuan chip that contains orders of magnitude more transistors and cores on a single chip (see this recent blog post from them explaining how they were able to solve the "yield problem" that had been preventing this approach from being economically practical in the past).
例如,所谓的“晶圆级”AI 训练芯片来自 Cerebras,它将整块 300mm 硅晶圆用于一个极其庞大的芯片,在单个芯片上包含数量级更多的晶体管和核心(参见他们最近的博客文章,解释了他们如何解决此前阻碍这种方法在经济上可行的“良率问题”)。

To put this into perspective, if you compare Cerebras' newest WSE-3 chip to Nvidia's flagship data-center GPU, the H100, the Cerebras chip has a total die area of 46,225 square millimeters compared to just 814 for the H100 (and the H100 is itself considered an enormous chip by industry standards); that's a multiple of ~57x! And instead of having 132 "streaming multiprocessor" cores enabled on the chip like the H100 has, the Cerebras chip has ~900,000 cores (granted, each of these cores is smaller and does a lot less, but it's still an almost unfathomably large number in comparison). In more concrete apples-to-apples terms, the Cerebras chip can do around ~32x the FLOPS in AI contexts as a single H100 chip. Since an H100 sells for close to $40k a pop, you can imagine that the WSE-3 chip isn't cheap.
从这个角度来看,如果将 Cerebras 最新的 WSE-3 芯片与 Nvidia 的旗舰数据中心 GPU——H100 进行比较,Cerebras 芯片的总晶圆面积为 46,225 平方毫米,而 H100 仅为 814 平方毫米(而 H100 本身在行业标准中已被认为是一个巨大的芯片);这相当于大约 57 倍!此外,与 H100 在芯片上启用了 132 个“流式多处理器”核心不同,Cerebras 芯片拥有约 900,000 个核心(当然,这些核心更小,功能也少得多,但相比之下,这仍然是一个几乎难以想象的巨大数字)。从更具体的对比来看,在 AI 计算环境中,Cerebras 芯片的 FLOPS 计算能力约为单个 H100 芯片的 32 倍。由于 H100 的售价接近 40,000 美元,可以想象 WSE-3 芯片的价格也不会便宜。

So why does this all matter? Well, instead of trying to battle Nvidia head-on by using a similar approach and trying to match the Mellanox interconnect technology, Cerebras has used a radically innovative approach to do an end-run around the interconnect problem: inter-processor bandwidth becomes much less of an issue when everything is running on the same super-sized chip. You don't even need to have the same level of interconnect because one mega chip replaces tons of H100s.
那么,为什么这一切都很重要?Cerebras 并没有试图通过类似的方法与 Nvidia 正面竞争,也没有试图匹配 Mellanox 互连技术,而是采用了一种极具创新性的方式来绕过互连问题:当所有计算都在同一个超大芯片上运行时,处理器间带宽问题就变得不那么重要了。你甚至不需要相同级别的互连,因为一个巨型芯片可以替代大量 H100。

And the Cerebras chips also work extremely well for AI inference tasks. In fact, you can try it today for free here and use Meta's very respectable Llama-3.3-70B model. It responds basically instantaneously, at ~1,500 tokens per second. To put that into perspective, anything above 30 tokens per second feels relatively snappy to users based on comparisons to ChatGPT and Claude, and even 10 tokens per second is fast enough that you can basically read the response while it's being generated.
而且 Cerebras 芯片在 AI 推理任务中也表现极为出色。事实上,你今天就可以在这里免费试用,并使用 Meta 的非常优秀的 Llama-3.3-70B 模型。它的响应几乎是瞬时的,速度约为 1,500 个 token 每秒。为了让你有个直观的对比,基于与 ChatGPT 和 Claude 的比较,任何超过 30 个 token 每秒的速度对用户来说都感觉相当流畅,甚至 10 个 token 每秒的速度也足够快,以至于你基本上可以在生成的同时阅读响应。

Cerebras is also not alone; there are other companies, like Groq (not to be confused with the Grok model family trained by Elon Musk's X AI). Groq has taken yet another innovative approach to solving the same fundamental problem. Instead of trying to compete with Nvidia's CUDA software stack directly, they've developed what they call a "tensor processing unit" (TPU) that is specifically designed for the exact mathematical operations that deep learning models need to perform. Their chips are designed around a concept called "deterministic compute," which means that, unlike traditional GPUs where the exact timing of operations can vary, their chips execute operations in a completely predictable way every single time.
Cerebras 也并非孤军奋战;还有其他公司,比如 Groq(不要与 Elon Musk 的 X AI 训练的 Grok 模型家族混淆)。Groq 采用了另一种创新方法来解决同样的基本问题。他们没有直接试图与 Nvidia 的 CUDA 软件栈竞争,而是开发了一种名为“张量处理单元”(TPU)的技术,专门用于执行深度学习模型所需的特定数学运算。他们的芯片围绕一个名为“确定性计算”的概念设计,这意味着,与传统 GPU 可能导致操作时间变化不同,他们的芯片每次执行操作时都完全可预测。

This might sound like a minor technical detail, but it actually makes a massive difference for both chip design and software development. Because the timing is completely deterministic, Groq can optimize their chips in ways that would be impossible with traditional GPU architectures. As a result, they've been demonstrating for the past 6+ months inference speeds of over 500 tokens per second with the Llama series of models and other open source models, far exceeding what's possible with traditional GPU setups. Like Cerebras, this is available today and you can try it for free here.
这听起来可能像是一个微小的技术细节,但实际上它对芯片设计和软件开发都有巨大的影响。由于时序是完全确定的,Groq 可以以传统 GPU 架构无法实现的方式优化其芯片。因此,在过去 6 个月以上的时间里,他们一直在展示 Llama 系列模型和其他开源模型的推理速度超过每秒 500 个 token,远远超出传统 GPU 方案的可能性。与 Cerebras 类似,这项技术今天已经可用,你可以在这里免费试用。

Using a comparable Llama3 model with "speculative decoding," Groq is able to generate 1,320 tokens per second, on par with Cerebras and far in excess of what is possible using regular GPUs. Now, you might ask what the point is of achieving 1,000+ tokens per second when users seem pretty satisfied with ChatGPT, which is operating at less than 10% of that speed. And the thing is, it does matter. It makes it a lot faster to iterate and not lose focus as a human knowledge worker when you get instant feedback. And if you're using the model programmatically via the API, which is increasingly where much of the demand is coming from, then it can enable whole new classes of applications that require multi-stage inference (where the output of previous stages is used as input in successive stages of prompting/inference) or which require low-latency responses, such as content moderation, fraud detection, dynamic pricing, etc.
使用可比的 Llama3 模型和“speculative decoding”,Groq 能够生成每秒 1,320 个 token,与 Cerebras 相当,并且远远超过常规 GPU 的能力。现在,你可能会问,当用户似乎对 ChatGPT 的速度(不到这个速度的 10%)感到满意时,实现每秒 1,000 多个 token 有什么意义。而事实是,这确实很重要。当你能够即时获得反馈时,作为人类知识工作者,可以更快地迭代并保持专注。而且,如果你是通过 API 以编程方式使用该模型——这正是越来越多需求的来源——那么它可以支持全新的应用类别,例如需要多阶段推理(即前一阶段的输出作为后续阶段提示/推理的输入)或需要低延迟响应的应用,如内容审核、欺诈检测、动态定价等。

But even more fundamentally, the faster you can serve requests, the faster you can cycle things, and the busier you can keep the hardware. Although Groq's hardware is extremely expensive, clocking in at $2mm to $3mm for a single server, it ends up costing far less per request fulfilled if you have enough demand to keep the hardware busy all the time.
但更根本的是,你处理请求的速度越快,循环的速度就越快,硬件的利用率就越高。尽管 Groq 的硬件极其昂贵,单台服务器的成本高达 200 万至 300 万美元,但如果有足够的需求让硬件始终保持忙碌,每个已完成请求的成本最终会低得多。

And like Nvidia with CUDA, a huge part of Groq's advantage comes from their own proprietary software stack. They are able to take the same open source models that other companies like Meta, DeepSeek, and Mistral develop and release for free, and decompose them in special ways that allow them to run dramatically faster on their specific hardware.
就像 Nvidia 的 CUDA 一样,Groq 的巨大优势很大程度上来自他们自有的专有软件栈。他们能够利用 Meta、DeepSeek 和 Mistral 等公司开发并免费发布的相同开源模型,并以特殊方式对其进行分解,使其能够在他们特定的硬件上运行得更快。

Like Cerebras, they have taken different technical decisions to optimize certain particular aspects of the process, which allows them to do things in a fundamentally different way. In Groq's case, it's because they are entirely focused on inference level compute, not on training: all their special sauce hardware and software only give these huge speed and efficiency advantages when doing inference on an already trained model.
像 Cerebras 一样,他们在技术上做出了不同的决策,以优化流程中的某些特定方面,从而使他们能够以根本不同的方式执行任务。在 Groq 的情况下,这是因为他们完全专注于推理级计算,而不是训练:他们所有的专有硬件和软件只有在对已训练模型进行推理时,才能提供这些巨大的速度和效率优势。

But if the next big scaling law that people are excited about is for inference level compute— and if the biggest drawback of COT models is the high latency introduced by having to generate all those intermediate logic tokens before they can respond— then even a company that only does inference compute, but which does it dramatically faster and more efficiently than Nvidia can— can introduce a serious competitive threat in the coming years. At the very least, Cerebras and Groq can chip away at the lofty expectations for Nvidia's revenue growth over the next 2-3 years that are embedded in the current equity valuation.
但如果人们期待的下一个重要扩展法则是针对推理级计算——而如果 COT 模型最大的缺点是由于必须生成所有这些中间逻辑标记才能响应而导致的高延迟——那么即使是一家只专注于推理计算的公司,但如果它的计算速度远远快于且效率远高于 Nvidia,也可能在未来几年内带来严重的竞争威胁。至少,Cerebras 和 Groq 可以削弱当前股权估值中对 Nvidia 未来 2-3 年收入增长的高预期。

Besides these particularly innovative, if relatively unknown, startup competitors, there is some serious competition coming from some of Nvidia's biggest customers themselves who have been making custom silicon that specifically targets AI training and inference workloads. Perhaps the best known of these is Google, which has been developing its own proprietary TPUs since 2016. Interestingly, although it briefly sold TPUs to external customers, Google has been using all its TPUs internally for the past several years, and it is already on its 6th generation of TPU hardware.
除了这些特别创新但相对不知名的初创公司竞争对手外,Nvidia 一些最大的客户本身也带来了激烈的竞争,他们一直在制造专门针对 AI 训练和推理工作负载的定制芯片。或许最知名的就是 Google,自 2016 年以来一直在开发其专有的 TPU。有趣的是,尽管 Google 曾短暂向外部客户销售 TPU,但在过去几年里一直在内部使用所有 TPU,并且其 TPU 硬件已经发展到第六代。

Amazon has also been developing its own custom chips called Trainium2 and Inferentia2. And while Amazon is building out data centers featuring billions of dollars of Nvidia GPUs, they are also at the same time investing many billions in other data centers that use these internal chips. They have one cluster that they are bringing online for Anthropic that features over 400k chips.
Amazon 也在开发自己的定制芯片,名为 Trainium2 和 Inferentia2。尽管 Amazon 正在建设包含数十亿美元 Nvidia GPU 的数据中心,但与此同时,他们也在投资数十亿美元用于采用这些内部芯片的其他数据中心。他们正在为 Anthropic 启用一个集群,其中包含超过 40 万颗芯片。

Amazon gets a lot of flak for totally bungling their internal AI model development, squandering massive amounts of internal compute resources on models that ultimately are not competitive, but the custom silicon is another matter. Again, they don't necessarily need their chips to be better and faster than Nvidia's. What they need is for their chips to be good enough, but build them at a breakeven gross margin instead of the ~90%+ gross margin that Nvidia earns on its H100 business.
Amazon 因完全搞砸了其内部 AI 模型开发而受到大量批评,浪费了大量内部计算资源在最终并不具备竞争力的模型上,但定制芯片则是另一回事。同样,他们的芯片不一定需要比 Nvidia 的更好更快。他们需要的是芯片足够优秀,但以盈亏平衡的毛利率生产,而不是 Nvidia 在其 H100 业务上获得的约 90%+毛利率。

OpenAI has also announced their plans to build custom chips, and they (together with Microsoft) are obviously the single largest user of Nvidia's data center hardware. As if that weren't enough, Microsoft have themselves announced their own custom chips!
OpenAI 也已宣布计划打造定制芯片,而他们(与 Microsoft 一起)显然是 Nvidia 数据中心硬件的最大单一用户。仿佛这还不够,Microsoft 也宣布了他们自己的定制芯片!

And Apple, the most valuable technology company in the world, has been blowing away expectations for years now with their highly innovative and disruptive custom silicon operation, which now completely trounces the CPUs from both Intel and AMD in terms of performance per watt, which is the most important factor in mobile (phone/tablet/laptop) applications. And they have been making their own internally designed GPUs and "Neural Processors" for years, even though they have yet to really demonstrate the utility of such chips outside of their own custom applications, like the advanced software based image processing used in the iPhone's camera.
而 Apple,这家全球最有价值的科技公司,多年来一直以高度创新和颠覆性的定制芯片业务超出市场预期,如今在每瓦性能方面完全击败了 Intel 和 AMD 的 CPU,而这正是移动(手机/平板/笔记本)应用中最重要的因素。此外,他们多年来一直在自主设计 GPU 和“神经处理器”,尽管这些芯片在自家定制应用(如 iPhone 相机的高级软件图像处理)之外的实用性尚未真正得到证明。

While Apple's focus seems somewhat orthogonal to these other players in terms of its mobile-first, consumer oriented, "edge compute" focus, if it ends up spending enough money on its new contract with OpenAI to provide AI services to iPhone users, you have to imagine that they have teams looking into making their own custom silicon for inference/training (although given their secrecy, you might never even know about it directly!).
虽然 Apple 的重点在其以移动为先、面向消费者的“边缘计算”方向上似乎与其他参与者有所不同,但如果它最终在与 OpenAI 的新合同上投入足够的资金,以向 iPhone 用户提供 AI 服务,你不得不想象他们会有团队在研究为推理/训练打造自定义芯片(尽管考虑到他们的保密性,你可能永远都不会直接知道!)。

Now, it's no secret that there is a strong power law distribution of Nvidia's hyper-scaler customer base, with the top handful of customers representing the lion's share of high-margin revenue. How should one think about the future of this business when literally every single one of these VIP customers is building their own custom chips specifically for AI training and inference?
现在,Nvidia 的超大规模客户群体呈现出明显的幂律分布,前几大客户占据了高利润收入的绝大部分。这项业务的未来该如何看待,当这些重要客户无一例外都在为 AI 训练和推理打造自己的定制芯片?

When thinking about all this, you should keep one incredibly important thing in mind: Nvidia is largely an IP based company. They don't make their own chips. The true special sauce for making these incredible devices arguably comes more from TSMC, the actual fab, and ASML, which makes the special EUV lithography machines used by TSMC to make these leading-edge process node chips. And that's critically important, because TSMC will sell their most advanced chips to anyone who comes to them with enough up-front investment and is willing to guarantee a certain amount of volume. They don't care if it's for Bitcoin mining ASICs, GPUs, TPUs, mobile phone SoCs, etc.
在考虑所有这些时,你应该牢记一件极其重要的事情:Nvidia 在很大程度上是一家基于 IP 的公司。他们不自己制造芯片。制造这些令人惊叹的设备的真正关键技术,或许更多来自于 TSMC(实际的晶圆厂)和 ASML(制造 TSMC 用于生产这些先进工艺节点芯片的特殊 EUV 光刻机)。这点至关重要,因为 TSMC 会将其最先进的芯片出售给任何愿意提供足够前期投资并保证一定产量的客户。他们并不在乎这些芯片是用于比特币挖矿 ASIC、GPU、TPU、手机 SoC 等。

As much as senior chip designers at Nvidia earn per year, surely some of the best of them could be lured away by these other tech behemoths for enough cash and stock. And once they have a team and resources, they can design innovative chips (again, perhaps not even 50% as advanced as an H100, but with that Nvidia gross margin, there is plenty of room to work with) in 2 to 3 years, and thanks for TSMC, they can turn those into actual silicon using the exact same process node technology as Nvidia.
尽管 Nvidia 的高级芯片设计师每年的收入不菲,但其中一些最优秀的人才肯定会被其他科技巨头用足够的现金和股票挖走。而一旦他们拥有团队和资源,他们可以在 2 到 3 年内设计出创新的芯片(或许甚至达不到 H100 的 50%先进程度,但凭借 Nvidia 的高毛利率,仍有足够的空间可供操作),并且多亏了 TSMC,他们可以使用与 Nvidia 完全相同的工艺节点技术将这些设计变成实际的硅芯片。

The Software Threat(s)  软件威胁

As if these looming hardware threats weren't bad enough, there are a few developments in the software world in the last couple years that, while they started out slowly, are now picking up real steam and could pose a serious threat to the software dominance of Nvidia's CUDA. The first of these is the horrible Linux drivers for AMD GPUs. Remember we talked about how AMD has inexplicably allowed these drivers to suck for years despite leaving massive amounts of money on the table?
仿佛这些迫在眉睫的硬件威胁还不够糟糕,软件领域在过去几年里也出现了一些发展,虽然起初进展缓慢,但现在正在加速,并可能对 Nvidia CUDA 的软件主导地位构成严重威胁。其中第一个就是 AMD GPU 的糟糕 Linux 驱动程序。还记得我们谈到 AMD 多年来莫名其妙地允许这些驱动程序表现糟糕,尽管这意味着放弃了大量潜在收入吗?

Well, amusingly enough, the infamous hacker George Hotz (famous for jailbreaking the original iphone as a teenager, and currently the CEO of self-driving startup Comma.ai and AI computer company Tiny Corp, which also makes the open-source tinygrad AI software framework), recently announced that he was sick and tired of dealing with AMD's bad drivers, and desperately wanted to be able to to leverage the lower cost AMD GPUs in their TinyBox AI computers (which come in multiple flavors, some of which use Nvidia GPUs, and some of which use AMD GPUS).
有趣的是,臭名昭著的黑客 George Hotz(因青少年时期破解原始 iPhone 而闻名,现为自动驾驶初创公司 Comma.ai 和 AI 计算公司 Tiny Corp 的 CEO,该公司还开发了开源 tinygrad AI 软件框架)最近宣布,他已经厌倦了处理 AMD 糟糕的驱动程序,并迫切希望能够利用成本更低的 AMD GPU 来运行他们的 TinyBox AI 计算机(这些计算机有多个版本,其中一些使用 Nvidia GPU,而另一些使用 AMD GPU)。

Well, he is making his own custom drivers and software stack for AMD GPUs without any help from AMD themselves; on Jan. 15th of 2025, he tweeted via his company's X account that "We are one piece away from a completely sovereign stack on AMD, the RDNA3 assembler. We have our own driver, runtime, libraries, and emulator. (all in ~12,000 lines!)" Given his track record and skills, it is likely that they will have this all working in the next couple months, and this would allow for a lot of exciting possibilities of using AMD GPUs for all sorts of applications where companies currently feel compelled to pay up for Nvidia GPUs.
嗯,他正在为 AMD GPU 制作自己的自定义驱动程序和软件栈,而没有得到 AMD 本身的任何帮助;在 2025 年 1 月 15 日,他通过其公司 X 账户发推称:“我们距离在 AMD 上实现完全自主的软件栈只差最后一块——RDNA3 汇编器。我们已经有了自己的驱动程序、运行时、库和模拟器。(总共约 12,000 行代码!)” 鉴于他的过往记录和技能,他们很可能会在接下来的几个月内让这一切正常运行,这将为使用 AMD GPU 进行各种应用带来许多令人兴奋的可能性,而目前公司往往不得不为 Nvidia GPU 付出高昂成本。

OK, well that's just a driver for AMD, and it's not even done yet. What else is there? Well, there are a few other areas on the software side that are a lot more impactful. For one, there is now a massive concerted effort across many large tech companies and the open source software community at large to make more generic AI software frameworks that have CUDA as just one of many "compilation targets".
好的,那只是 AMD 的一个驱动程序,而且还没有完成。还有什么?在软件方面,还有一些其他领域影响更大。首先,现在许多大型科技公司和整个开源软件社区正在大规模协同努力,开发更通用的 AI 软件框架,使 CUDA 只是众多“编译目标”之一。

That is, you write your software using higher-level abstractions, and the system itself can automatically turn those high-level constructs into super well-tuned low-level code that works extremely well on CUDA. But because it's done at this higher level of abstraction, it can just as easily get compiled into low-level code that works extremely well on lots of other GPUs and TPUs from a variety of providers, such as the massive number of custom chips in the pipeline from every big tech company.
也就是说,你使用更高级的抽象来编写软件,系统本身可以自动将这些高级结构转换为在 CUDA 上运行极其高效的低级代码。但由于这是在更高级的抽象层面完成的,它同样可以被编译成在许多其他 GPU 和 TPU 上运行极其高效的低级代码,这些 GPU 和 TPU 来自各种供应商,例如各大科技公司正在开发的大量定制芯片。

The most famous examples of these frameworks are MLX (sponsored primarily by Apple), Triton (sponsored primarily by OpenAI), and JAX (developed by Google). MLX is particularly interesting because it provides a PyTorch-like API that can run efficiently on Apple Silicon, showing how these abstraction layers can enable AI workloads to run on completely different architectures. Triton, meanwhile, has become increasingly popular as it allows developers to write high-performance code that can be compiled to run on various hardware targets without having to understand the low-level details of each platform.
这些框架最著名的例子是 MLX(主要由 Apple 赞助)、Triton(主要由 OpenAI 赞助)和 JAX(由 Google 开发)。MLX 特别有趣,因为它提供了类似 PyTorch 的 API,可以在 Apple Silicon 上高效运行,展示了这些抽象层如何使 AI 工作负载能够在完全不同的架构上运行。与此同时,Triton 变得越来越受欢迎,因为它允许开发者编写高性能代码,并能编译以在各种硬件目标上运行,而无需理解每个平台的底层细节。

These frameworks allow developers to write their code once using high powered abstractions and then target tons of platforms automatically— doesn't that sound like a better way to do things, which would give you a lot more flexibility in terms of how you actually run the code?
这些框架允许开发者使用高效的抽象方式编写代码一次,然后自动适配大量平台——这难道不是一种更好的做事方式,让你在实际运行代码时拥有更多灵活性吗?

In the 1980s, all the most popular, best selling software was written in hand-tuned assembly language. The PKZIP compression utility for example was hand crafted to maximize speed, to the point where a competently coded version written in the standard C programming language and compiled using the best available optimizing compilers at the time, would run at probably half the speed of the hand-tuned assembly code. The same is true for other popular software packages like WordStar, VisiCalc, and so on.
在 1980 年代,所有最流行、最畅销的软件都是用手工优化的汇编语言编写的。例如,PKZIP 压缩工具被精心打造以最大化速度,以至于即使使用标准 C 编程语言编写并由当时最好的优化编译器编译的版本,其运行速度可能也只有手工优化的汇编代码的一半。其他流行的软件包,如 WordStar、VisiCalc 等,也都是如此。

Over time, compilers kept getting better and better, and every time the CPU architectures changed (say, from Intel releasing the 486, then the Pentium, and so on), that hand-rolled assembler would often have to be thrown out and rewritten, something that only the smartest coders were capable of (sort of like how CUDA experts are on a different level in the job market versus a "regular" software developer). Eventually, things converged so that the speed benefits of hand-rolled assembly were outweighed dramatically by the flexibility of being able to write code in a high-level language like C or C++, where you rely on the compiler to make things run really optimally on the given CPU.
随着时间的推移,编译器变得越来越好,每当 CPU 架构发生变化(比如从 Intel 发布 486,到 Pentium,等等),那些手写的汇编代码往往不得不被丢弃并重写,而这只有最聪明的程序员才能做到(有点类似于 CUDA 专家在就业市场上的水平远高于“普通”软件开发者)。最终,情况逐渐趋同,以至于手写汇编的速度优势被高层语言(如 C 或 C++)的灵活性所大幅超越,在这些语言中,程序员依赖编译器来使代码在特定 CPU 上运行得尽可能高效。

Nowadays, very little new code is written in assembly. I believe a similar transformation will end up happening for AI training and inference code, for similar reasons: computers are good at optimization, and flexibility and speed of development is increasingly the more important factor— especially if it also allows you to save dramatically on your hardware bill because you don't need to keep paying the "CUDA tax" that gives Nvidia 90%+ margins.
如今,几乎没有新的代码是用汇编编写的。我认为类似的转变最终也会发生在 AI 训练和推理代码上,原因类似:计算机擅长优化,而灵活性和开发速度正变得越来越重要——尤其是如果这还能让你大幅节省硬件成本,因为你不需要继续支付“CUDA 税”,这让 Nvidia 的利润率超过 90%。

Yet another area where you might see things change dramatically is that CUDA might very well end up being more of a high level abstraction itself— a "specification language" similar to Verilog (used as the industry standard to describe chip layouts) that skilled developers can use to describe high-level algorithms that involve massive parallelism (since they are already familiar with it, it's very well constructed, it's the lingua franca, etc.), but then instead of having that code compiled for use on Nvidia GPUs like you would normally do, it can instead be fed as source code into an LLM which can port it into whatever low-level code is understood by the new Cerebras chip, or the new Amazon Trainium2, or the new Google TPUv6, etc. This isn't as far off as you might think; it's probably already well within reach using OpenAI's latest O3 model, and surely will be possible generally within a year or two.
另一个可能发生巨大变化的领域是,CUDA 本身可能最终会成为更高级的抽象——类似于 Verilog(作为行业标准用于描述芯片布局)的一种“规范语言”,熟练的开发者可以用它来描述涉及大规模并行计算的高级算法(因为他们已经熟悉它,它构造良好,它是通用语言等)。但不同的是,这段代码不再像通常那样被编译用于 Nvidia GPU,而是可以作为源代码输入到一个 LLM,然后将其转换为新型 Cerebras 芯片、新的 Amazon Trainium2 或新的 Google TPUv6 所能理解的低级代码等。这一变化可能比你想象的更近;使用 OpenAI 最新的 O3 模型,这或许已经可以实现,并且在一两年内很可能会普遍成为可能。

The Theoretical Threat  理论上的威胁

Perhaps the most shocking development which was alluded to earlier happened in the last couple of weeks. And that is the news that has totally rocked the AI world, and which has been dominating the discourse among knowledgeable people on Twitter despite its complete absence from any of the mainstream media outlets: that a small Chinese startup called DeepSeek released two new models that have basically world-competitive performance levels on par with the best models from OpenAI and Anthropic (blowing past the Meta Llama3 models and other smaller open source model players such as Mistral). These models are called DeepSeek-V3 (basically their answer to GPT-4o and Claude3.5 Sonnet) and DeepSeek-R1 (basically their answer to OpenAI's O1 model).
也许最令人震惊的发展正如前面提到的那样,发生在过去几周。而这条新闻彻底震撼了 AI 界,并且尽管主流媒体完全没有报道,但它一直主导着 Twitter 上知识人士的讨论:一家名为 DeepSeek 的中国初创公司发布了两个新模型,其性能基本上达到了世界竞争水平,可与 OpenAI 和 Anthropic 的最佳模型相媲美(远超 Meta Llama3 模型以及其他较小的开源模型玩家,如 Mistral)。这些模型分别是 DeepSeek-V3(基本上是他们对 GPT-4o 和 Claude3.5 Sonnet 的回应)和 DeepSeek-R1(基本上是他们对 OpenAI 的 O1 模型的回应)。

Why is this all so shocking? Well, first of all, DeepSeek is a tiny Chinese company that reportedly has under 200 employees. The story goes that they started out as a quant trading hedge fund similar to TwoSigma or RenTec, but after Xi Jinping cracked down on that space, they used their math and engineering chops to pivot into AI research. Who knows if any of that is really true or if they are merely some kind of front for the CCP or the Chinese military. But the fact remains that they have released two incredibly detailed technical reports, for DeepSeek-V3 and DeepSeekR1.
为什么这一切如此令人震惊?首先,DeepSeek 是一家规模很小的中国公司,据称员工不到 200 人。据说他们最初是一家类似 TwoSigma 或 RenTec 的量化交易对冲基金,但在习近平对该领域进行打压后,他们利用自己的数学和工程能力转向了 AI 研究。谁知道这些说法是否属实,或者他们是否只是中共或中国军方的某种幌子。但事实是,他们已经发布了两份极其详细的技术报告,分别是 DeepSeek-V3 和 DeepSeekR1。

These are heavy technical reports, and if you don't know a lot of linear algebra, you probably won't understand much. But what you should really try is to download the free DeepSeek app on the AppStore here and install it using a Google account to log in and give it a try (you can also install it on Android here), or simply try it out on your desktop computer in the browser here. Make sure to select the "DeepThink" option to enable chain-of-thought (the R1 model) and ask it to explain parts of the technical reports in simple terms.
这些是高深的技术报告,如果你不太懂线性代数,可能不会理解太多。但你真正应该尝试的是在 AppStore 这里下载免费的 DeepSeek 应用并安装,使用 Google 账户登录并试用(你也可以在 Android 这里安装),或者直接在桌面浏览器中试用这里。确保选择“DeepThink”选项,以启用链式思维(R1 模型),并让它用简单的语言解释技术报告的部分内容。

This will simultaneously show you a few important things:
这将同时向你展示一些重要的内容:

Quote
One, this model is absolutely legit. There is a lot of BS that goes on with AI benchmarks, which are routinely gamed so that models appear to perform great on the benchmarks but then suck in real world tests. Google is certainly the worst offender in this regard, constantly crowing about how amazing their LLMs are, when they are so awful in any real world test that they can't even reliably accomplish the simplest possible tasks, let alone challenging coding tasks. These DeepSeek models are not like that— the responses are coherent, compelling, and absolutely on the same level as those from OpenAI and Anthropic.
首先,这个模型绝对是合法的。AI 基准测试中经常充斥着大量的虚假信息,这些测试经常被操纵,使得模型在基准测试中表现出色,但在真实世界测试中却表现糟糕。Google 在这方面无疑是最严重的违规者,不断吹嘘他们的 LLMs 有多么惊人,但在任何真实世界测试中都表现得极其糟糕,甚至无法可靠地完成最简单的任务,更不用说具有挑战性的编码任务了。这些 DeepSeek 模型并非如此——它们的回答连贯、有说服力,并且绝对与 OpenAI 和 Anthropic 的模型处于同一水平。

Two, that DeepSeek has made profound advancements not just in model quality, but more importantly in model training and inference efficiency. By being extremely close to the hardware and by layering together a handful of distinct, very clever optimizations, DeepSeek was able to train these incredible models using GPUs in a dramatically more efficient way. By some measurements, over ~45x more efficiently than other leading-edge models. DeepSeek claims that the complete cost to train DeepSeek-V3 was just over $5mm. That is absolutely nothing by the standards of OpenAI, Anthropic, etc., which were well into the $100mm+ level for training costs for a single model as early as 2024.
其次,DeepSeek 在模型质量方面取得了深远的进步,但更重要的是在模型训练和推理效率方面的提升。通过与硬件的极致贴合,并结合一系列独特且巧妙的优化,DeepSeek 能够以极高的效率使用 GPU 训练这些令人惊叹的模型。根据某些测量,其效率比其他前沿模型高出约 45 倍。DeepSeek 声称训练 DeepSeek-V3 的总成本仅略高于 500 万美元。相比之下,这一成本在 OpenAI、Anthropic 等公司的标准下几乎微不足道,因为它们早在 2024 年训练单个模型的成本就已远超 1 亿美元。

How in the world could this be possible? How could this little Chinese company completely upstage all the smartest minds at our leading AI labs, which have 100 times more resources, headcount, payroll, capital, GPUs, etc? Wasn't China supposed to be crippled by Biden's restriction on GPU exports? Well, the details are fairly technical, but we can at least describe them at a high level. It might have just turned out that the relative GPU processing poverty of DeepSeek was the critical ingredient to make them more creative and clever, necessity being the mother of invention and all.
这怎么可能?这家小小的中国公司怎么能完全压倒我们顶尖 AI 实验室中所有最聪明的人才,而这些实验室拥有 100 倍的资源、员工、薪资、资本、GPU 等?中国不是应该因为拜登对 GPU 出口的限制而受挫吗?其实,细节相当技术性,但我们至少可以从高层次上进行描述。或许,DeepSeek 在 GPU 处理能力上的相对贫乏,恰恰成为促使他们更加富有创造力和聪明才智的关键因素,毕竟,需求是发明之母。

A major innovation is their sophisticated mixed-precision training framework that lets them use 8-bit floating point numbers (FP8) throughout the entire training process. Most Western AI labs train using "full precision" 32-bit numbers (this basically specifies the number of gradations possible in describing the output of an artificial neuron; 8 bits in FP8 lets you store a much wider range of numbers than you might expect— it's not just limited to 256 different equal-sized magnitudes like you'd get with regular integers, but instead uses clever math tricks to store both very small and very large numbers— though naturally with less precision than you'd get with 32 bits.) The main tradeoff is that while FP32 can store numbers with incredible precision across an enormous range, FP8 sacrifices some of that precision to save memory and boost performance, while still maintaining enough accuracy for many AI workloads.
一个重大创新是他们复杂的混合精度训练框架,使其能够在整个训练过程中使用 8 位浮点数(FP8)。大多数西方 AI 实验室使用“全精度”32 位数进行训练(这基本上指定了在描述人工神经元输出时可能的梯度数量;FP8 中的 8 位允许存储的数值范围比预期的要广——它不仅仅局限于 256 个等间距的数值,如同常规整数那样,而是利用巧妙的数学技巧来存储非常小和非常大的数值——尽管自然比 32 位的精度要低)。主要的权衡在于,虽然 FP32 可以在极大的范围内存储极高精度的数值,FP8 牺牲了一部分精度以节省内存并提升性能,同时仍能保持足够的准确性以满足许多 AI 任务的需求。

DeepSeek cracked this problem by developing a clever system that breaks numbers into small tiles for activations and blocks for weights, and strategically uses high-precision calculations at key points in the network. Unlike other labs that train in high precision and then compress later (losing some quality in the process), DeepSeek's native FP8 approach means they get the massive memory savings without compromising performance. When you're training across thousands of GPUs, this dramatic reduction in memory requirements per GPU translates into needing far fewer GPUs overall.
DeepSeek 通过开发一个巧妙的系统破解了这个问题,该系统将数字拆分为用于激活的小块和用于权重的块,并在网络的关键点战略性地使用高精度计算。与其他实验室先以高精度训练然后再压缩(在此过程中会损失一些质量)不同,DeepSeek 的原生 FP8 方法意味着他们在不影响性能的情况下实现了大规模的内存节省。当你在数千个 GPU 上进行训练时,每个 GPU 的内存需求大幅减少,这意味着整体所需的 GPU 数量大大降低。

Another major breakthrough is their multi-token prediction system. Most Transformer based LLM models do inference by predicting the next token— one token at a time. DeepSeek figured out how to predict multiple tokens while maintaining the quality you'd get from single-token prediction. Their approach achieves about 85-90% accuracy on these additional token predictions, which effectively doubles inference speed without sacrificing much quality. The clever part is they maintain the complete causal chain of predictions, so the model isn't just guessing— it's making structured, contextual predictions.
另一个重大突破是他们的多标记预测系统。大多数基于 Transformer 的 LLM 模型通过逐个预测下一个标记来进行推理。DeepSeek 找到了在保持单标记预测质量的同时预测多个标记的方法。他们的方法在这些额外的标记预测上达到了约 85-90% 的准确率,从而有效地将推理速度提高了一倍,而质量几乎没有损失。巧妙之处在于他们保持了完整的因果预测链,因此模型不仅仅是在猜测,而是在进行结构化、具有上下文的预测。

One of their most innovative developments is what they call Multi-head Latent Attention (MLA). This is a breakthrough in how they handle what are called the Key-Value indices, which are basically how individual tokens are represented in the attention mechanism within the Transformer architecture. Although this is getting a bit too advanced in technical terms, suffice it to say that these KV indices are some of the major uses of VRAM during the training and inference process, and part of the reason why you need to use thousands of GPUs at the same time to train these models— each GPU has a maximum of 96 gb of VRAM, and these indices eat that memory up for breakfast.
他们最具创新性的开发之一是他们称之为 Multi-head Latent Attention (MLA) 的技术。这是在处理所谓的 Key-Value 索引方面的突破,这些索引基本上决定了在 Transformer 架构的注意力机制中,单个 token 是如何表示的。尽管这在技术上有些过于高级,但简单来说,这些 KV 索引是训练和推理过程中 VRAM 的主要用途之一,也是为什么需要同时使用成千上万块 GPU 来训练这些模型的部分原因——每块 GPU 最多只有 96GB 的 VRAM,而这些索引会迅速消耗掉这部分内存。

Their MLA system finds a way to store a compressed version of these indices that captures the essential information while using far less memory. The brilliant part is this compression is built directly into how the model learns— it's not some separate step they need to do, it's built directly into the end-to-end training pipeline. This means that the entire mechanism is "differentiable" and able to be trained directly using the standard optimizers. All this stuff works because these models are ultimately finding much lower-dimensional representations of the underlying data than the so-called "ambient dimensions". So it's wasteful to store the full KV indices, even though that is basically what everyone else does.
他们的 MLA 系统找到了一种方法来存储这些索引的压缩版本,在保留关键信息的同时占用更少的内存。巧妙之处在于,这种压缩直接融入了模型的学习过程——它不是一个额外的步骤,而是直接构建在端到端的训练流程中。这意味着整个机制是“可微的”,可以直接使用标准优化器进行训练。所有这些都能奏效,是因为这些模型最终找到的是底层数据的低维表示,而不是所谓的“环境维度”。因此,存储完整的 KV 索引是浪费的,尽管基本上所有其他人都是这么做的。

Not only do you end up wasting tons of space by storing way more numbers than you need, which gives a massive boost to the training memory footprint and efficiency (again, slashing the number of GPUs you need to train a world class model), but it can actually end up improving model quality because it can act like a "regularizer," forcing the model to pay attention to the truly important stuff instead of using the wasted capacity to fit to noise in the training data. So not only do you save a ton of memory, but the model might even perform better. At the very least, you don't get a massive hit to performance in exchange for the huge memory savings, which is generally the kind of tradeoff you are faced with in AI training.
不仅会因为存储远超所需的数字而浪费大量空间,从而大幅增加训练的内存占用和效率(同样减少训练世界级模型所需的 GPU 数量),但实际上这还能提升模型质量,因为它可以充当“正则化器”,迫使模型关注真正重要的内容,而不是利用多余的容量去拟合训练数据中的噪声。因此,不仅可以节省大量内存,模型甚至可能表现得更好。至少,你不会因为巨大的内存节省而遭受严重的性能损失,而这通常是在 AI 训练中需要权衡的因素。

They also made major advances in GPU communication efficiency through their DualPipe algorithm and custom communication kernels. This system intelligently overlaps computation and communication, carefully balancing GPU resources between these tasks. They only need about 20 of their GPUs' streaming multiprocessors (SMs) for communication, leaving the rest free for computation. The result is much higher GPU utilization than typical training setups achieve.
他们还通过 DualPipe 算法和自定义通信内核在 GPU 通信效率方面取得了重大进展。该系统智能地重叠计算和通信,精确平衡 GPU 资源在这些任务之间的分配。他们只需要大约 20 个 GPU 的流式多处理器(SMs)用于通信,其余部分可用于计算。结果是 GPU 的利用率远高于典型的训练设置所能达到的水平。

Another very smart thing they did is to use what is known as a Mixture-of-Experts (MOE) Transformer architecture, but with key innovations around load balancing. As you might know, the size or capacity of an AI model is often measured in terms of the number of parameters the model contains. A parameter is just a number that stores some attribute of the model; either the "weight" or importance a particular artificial neuron has relative to another one, or the importance of a particular token depending on its context (in the "attention mechanism"), etc.
另一个非常聪明的做法是使用了所谓的 Mixture-of-Experts (MOE) Transformer 架构,并在负载均衡方面进行了关键创新。正如你可能知道的,AI 模型的规模或容量通常以模型包含的参数数量来衡量。参数只是存储模型某些属性的数值;它可以是某个人工神经元相对于另一个神经元的“权重”或重要性,或者是在“注意力机制”中某个特定标记在特定上下文中的重要性等。

Meta's latest Llama3 models come in a few sizes, for example: a 1 billion parameter version (the smallest), a 70B parameter model (the most commonly deployed one), and even a massive 405B parameter model. This largest model is of limited utility for most users because you would need to have tens of thousands of dollars worth of GPUs in your computer just to run at tolerable speeds for inference, at least if you deployed it in the naive full-precision version. Therefore most of the real-world usage and excitement surrounding these open source models is at the 8B parameter or highly quantized 70B parameter level, since that's what can fit in a consumer-grade Nvidia 4090 GPU, which you can buy now for under $1,000.
Meta 最新的 Llama3 模型有几种不同的规模,例如:一个 10 亿参数版本(最小的)、一个 70B 参数模型(最常部署的),甚至还有一个庞大的 405B 参数模型。这个最大模型对大多数用户的实用性有限,因为仅仅为了在推理时达到可接受的速度,你的计算机就需要价值数万美元的 GPU,至少在你以天真全精度版本部署它的情况下。因此,大多数实际应用和对这些开源模型的关注都集中在 8B 参数或高度量化的 70B 参数级别,因为这些可以适配于消费级 Nvidia 4090 GPU,而这款 GPU 现在的售价不到 1,000 美元。

So why does any of this matter? Well, in a sense, the parameter count and precision tells you something about how much raw information or data the model has stored internally. Note that I'm not talking about reasoning ability, or the model's "IQ" if you will: it turns out that models with even surprisingly modest parameter counts can show remarkable cognitive performance when it comes to solving complex logic problems, proving theorems in plane geometry, SAT math problems, etc.
那么,为什么这些重要呢?从某种意义上说,参数数量和精度可以告诉你模型在内部存储了多少原始信息或数据。请注意,我这里并不是在谈论推理能力,或者模型的“智商”,如果你愿意这么称呼的话:事实证明,即使是参数数量相对较少的模型,在解决复杂的逻辑问题、证明平面几何定理、SAT 数学问题等方面,也能展现出惊人的认知能力。

But those small models aren't going to be able to necessarily tell you every aspect of every plot twist in every single novel by Stendhal, whereas the really big models can potentially do that. The "cost" of that extreme level of knowledge is that the models become very unwieldy both to train and to do inference on, because you always need to store every single one of those 405B parameters (or whatever the parameter count is) in the GPU's VRAM at the same time in order to do any inference with the model.
但那些小模型不一定能够告诉你司汤达每部小说中每个情节转折的所有细节,而真正的大模型可能可以做到。达到这种极端知识水平的“代价”是,这些模型在训练和推理时变得非常笨重,因为你始终需要在 GPU 的 VRAM 中同时存储所有 405B 个参数(或其他参数数量),才能对模型进行任何推理。

The beauty of the MOE model approach is that you can decompose the big model into a collection of smaller models that each know different, non-overlapping (at least fully) pieces of knowledge. DeepSeek's innovation here was developing what they call an "auxiliary-loss-free" load balancing strategy that maintains efficient expert utilization without the usual performance degradation that comes from load balancing. Then, depending on the nature of the inference request, you can intelligently route the inference to the "expert" models within that collection of smaller models that are most able to answer that question or solve that task.
MOE 模型方法的优雅之处在于,你可以将大模型分解为一组较小的模型,每个模型掌握不同的、至少是完全不重叠的知识片段。DeepSeek 的创新在于开发了一种他们称之为“无辅助损失”的负载均衡策略,该策略在保持高效专家利用率的同时,避免了负载均衡通常带来的性能下降。然后,根据推理请求的性质,你可以智能地将推理路由到这组较小模型中的“专家”模型,使其最有能力回答该问题或解决该任务。

You can loosely think of it as being a committee of experts who have their own specialized knowledge domains: one might be a legal expert, the other a computer science expert, the other a business strategy expert. So if a question comes in about linear algebra, you don't give it to the legal expert. This is of course a very loose analogy and it doesn't actually work like this in practice.
你可以粗略地将其理解为一个由专家组成的委员会,每位专家都有自己专门的知识领域:一个可能是法律专家,另一个是计算机科学专家,另一个是商业战略专家。因此,如果有关于线性代数的问题,你不会把它交给法律专家。当然,这只是一个非常粗略的类比,实际情况并不是这样运作的。

The real advantage of this approach is that it allows the model to contain a huge amount of knowledge without being very unwieldy, because even though the aggregate number of parameters is high across all the experts, only a small subset of these parameters is "active" at any given time, which means that you only need to store this small subset of weights in VRAM in order to do inference. In the case of DeepSeek-V3, they have an absolutely massive MOE model with 671B parameters, so it's much bigger than even the largest Llama3 model, but only 37B of these parameters are active at any given time— enough to fit in the VRAM of two consumer-grade Nvidia 4090 GPUs (under $2,000 total cost), rather than requiring one or more H100 GPUs which cost something like $40k each.
这种方法的真正优势在于,它使模型能够包含大量知识而不会变得过于笨重。因为尽管所有专家的参数总数很高,但在任何给定时间内,只有一小部分参数是“激活”的,这意味着在进行推理时,你只需要在 VRAM 中存储这小部分权重。以 DeepSeek-V3 为例,他们拥有一个庞大的 MOE 模型,参数量高达 6710 亿,比最大的 Llama3 模型还要大得多,但在任何给定时间内,只有 370 亿个参数是激活的——足以适应两张消费级 Nvidia 4090 GPU 的 VRAM(总成本低于 2000 美元),而无需使用一张或多张 H100 GPU(每张成本约 4 万美元)。

It's rumored that both ChatGPT and Claude use an MoE architecture, with some leaks suggesting that GPT-4 had a total of 1.8 trillion parameters split across 8 models containing 220 billion parameters each. Despite that being a lot more doable than trying to fit all 1.8 trillion parameters in VRAM, it still requires multiple H100-grade GPUs just to run the model because of the massive amount of memory used.
有传言称 ChatGPT 和 Claude 都使用 MoE 架构,一些泄露信息表明 GPT-4 总共有 1.8 万亿参数,分布在 8 个模型中,每个包含 2200 亿参数。尽管这样比尝试将全部 1.8 万亿参数装入 VRAM 更可行,但由于占用的内存量巨大,仍然需要多块 H100 级别的 GPU 才能运行该模型。

Beyond what has already been described, the technical papers mention several other key optimizations. These include their extremely memory-efficient training framework that avoids tensor parallelism, recomputes certain operations during backpropagation instead of storing them, and shares parameters between the main model and auxiliary prediction modules. The sum total of all these innovations, when layered together, has led to the ~45x efficiency improvement numbers that have been tossed around online, and I am perfectly willing to believe these are in the right ballpark.
除了已经描述的内容之外,技术论文还提到了其他几个关键优化。其中包括他们极其高效的内存训练框架,该框架避免了张量并行,在反向传播过程中重新计算某些操作而不是存储它们,并在主模型和辅助预测模块之间共享参数。所有这些创新叠加在一起,总体上带来了大约 45 倍的效率提升,这一数字在网上被广泛讨论,而我完全愿意相信这些数据大致是准确的。

One very strong indicator that it's true is the cost of DeepSeek's API: despite this nearly best-in-class model performance, DeepSeek charges something like 95% less money for inference requests via its API than comparable models from OpenAI and Anthropic. In a sense, it's sort of like comparing Nvidia's GPUs to the new custom chips from competitors: even if they aren't quite as good, the value for money is so much better that it can still be a no-brainer depending on the application, as long as you can qualify the performance level and prove that it's good enough for your requirements and the API availability and latency is good enough (thus far, people have been amazed at how well DeepSeek's infrastructure has held up despite the truly incredible surge of demand owing to the performance of these new models).
一个非常有力的指标表明这是真的,那就是 DeepSeek 的 API 成本:尽管其模型性能几乎是业内最优之一,DeepSeek 通过其 API 处理推理请求的收费比 OpenAI 和 Anthropic 的类似模型低约 95%。在某种意义上,这有点像将 Nvidia 的 GPU 与竞争对手的新定制芯片进行比较:即使它们不完全一样好,但性价比要高得多,因此根据具体应用,它仍然可能是不二之选,只要你能确定性能水平并证明它足够满足你的需求,同时 API 的可用性和延迟也足够好。(到目前为止,人们对 DeepSeek 的基础设施在这些新模型的卓越性能带来的惊人需求激增下仍能保持稳定感到惊讶)。

But unlike the case of Nvidia, where the cost differential is the result of them earning monopoly gross margins of 90%+ on their data-center products, the cost differential of the DeepSeek API relative to the OpenAI and Anthropic API could be simply that they are nearly 50x more compute efficient (it might even be significantly more than that on the inference side— the ~45x efficiency was on the training side). Indeed, it's not even clear that OpenAI and Anthropic are making great margins on their API services— they might be more interested in revenue growth and gathering more data from analyzing all the API requests they receive.
但与 Nvidia 的情况不同,Nvidia 的数据中心产品之所以存在成本差异,是因为他们获得了 90%以上的垄断毛利率,而 DeepSeek API 相对于 OpenAI 和 Anthropic API 的成本差异可能仅仅是因为它们的计算效率几乎高出 50 倍(在推理方面甚至可能远超这一数值——约 45 倍的效率提升是在训练阶段)。事实上,OpenAI 和 Anthropic 是否能在其 API 服务上获得高利润率尚不明确——他们可能更关注收入增长,并通过分析收到的所有 API 请求来收集更多数据。

Before moving on, I'd be remiss if I didn't mention that many people are speculating that DeepSeek is simply lying about the number of GPUs and GPU hours spent training these models because they actually possess far more H100s than they are supposed to have given the export restrictions on these cards, and they don't want to cause trouble for themselves or hurt their chances of acquiring more of these cards. While it's certainly possible, I think it's more likely that they are telling the truth, and that they have simply been able to achieve these incredible results by being extremely clever and creative in their approach to training and inference. They explain how they are doing things, and I suspect that it's only a matter of time before their results are widely replicated and confirmed by other researchers at various other labs.
在继续之前,如果我不提及这一点,那就太失职了:许多人猜测 DeepSeek 可能在 GPU 数量和训练这些模型所花费的 GPU 小时数上撒谎,因为他们实际上拥有的 H100 远超过他们应该拥有的数量,考虑到这些显卡的出口限制。他们不想给自己惹麻烦,也不想影响他们获取更多这些显卡的机会。虽然这确实有可能,但我认为更有可能的情况是他们说的是真话,他们只是通过极其聪明和有创造力的训练和推理方法,才取得了这些惊人的成果。他们解释了自己的做法,我怀疑只是时间问题,他们的结果就会被其他实验室的研究人员广泛复制和验证。

A Model That Can Really Think 一个真正能思考的模型

The newer R1 model and technical report might even be even more mind blowing, since they were able to beat Anthropic to Chain-of-thought and now are basically the only ones besides OpenAI who have made this technology work at scale. But note that the O1 preview model was only released by OpenAI in mid-September of 2024. That's only ~4 months ago! Something you absolutely must keep in mind is that, unlike OpenAI, which is incredibly secretive about how these models really work at a low level, and won't release the actual model weights to anyone besides partners like Microsoft and other who sign heavy-duty NDAs, these DeepSeek models are both completely open-source and permissively licensed. They have released extremely detailed technical reports explaining how they work, as well as the code that anyone can look at and try to copy.
更新的 R1 模型和技术报告可能更加令人惊叹,因为他们成功在 Chain-of-thought 方面击败了 Anthropic,并且现在基本上是除了 OpenAI 之外唯一能够让这项技术大规模运作的团队。但需要注意的是,O1 预览模型是 OpenAI 在 2024 年 9 月中旬才发布的,仅仅大约 4 个月前!你绝对必须牢记的一点是,与 OpenAI 不同,后者对这些模型在底层如何运作极为保密,并且不会向除 Microsoft 等签署了严格 NDA 的合作伙伴之外的任何人公开实际的模型权重,这些 DeepSeek 模型则是完全开源且采用宽松许可的。他们发布了极为详细的技术报告,解释其工作原理,并提供了任何人都可以查看和尝试复制的代码。

With R1, DeepSeek essentially cracked one of the holy grails of AI: getting models to reason step-by-step without relying on massive supervised datasets. Their DeepSeek-R1-Zero experiment showed something remarkable: using pure reinforcement learning with carefully crafted reward functions, they managed to get models to develop sophisticated reasoning capabilities completely autonomously. This wasn't just about solving problems— the model organically learned to generate long chains of thought, self-verify its work, and allocate more computation time to harder problems.
借助 R1,DeepSeek 基本上破解了 AI 领域的一个圣杯:让模型能够逐步推理,而无需依赖大规模的监督数据集。他们的 DeepSeek-R1-Zero 实验展示了一项非凡的成果:通过纯强化学习和精心设计的奖励函数,他们成功让模型完全自主地发展出复杂的推理能力。这不仅仅是解决问题——模型自发地学会了生成长链思维、自我验证其工作,并为更难的问题分配更多的计算时间。

The technical breakthrough here was their novel approach to reward modeling. Rather than using complex neural reward models that can lead to "reward hacking" (where the model finds bogus ways to boost their rewards that don't actually lead to better real-world model performance), they developed a clever rule-based system that combines accuracy rewards (verifying final answers) with format rewards (encouraging structured thinking). This simpler approach turned out to be more robust and scalable than the process-based reward models that others have tried.
这里的技术突破在于他们对奖励建模的创新方法。与其使用可能导致“奖励黑客攻击”(即模型找到虚假方式来提高奖励,但实际上并未提升真实世界模型性能)的复杂神经奖励模型,他们开发了一个巧妙的基于规则的系统,该系统结合了准确性奖励(验证最终答案)和格式奖励(鼓励结构化思维)。这种更简单的方法被证明比其他人尝试的基于过程的奖励模型更稳健且更具可扩展性。

What's particularly fascinating is that during training, they observed what they called an "aha moment," a phase where the model spontaneously learned to revise its thinking process mid-stream when encountering uncertainty. This emergent behavior wasn't explicitly programmed; it arose naturally from the interaction between the model and the reinforcement learning environment. The model would literally stop itself, flag potential issues in its reasoning, and restart with a different approach, all without being explicitly trained to do this.
特别有趣的是,在训练过程中,他们观察到了一个所谓的“顿悟时刻”,即模型在遇到不确定性时,自发地学会在中途调整其思维过程。这种涌现行为并非被明确编程出来的,而是自然地从模型与强化学习环境的交互中产生的。模型会主动暂停自身,标记推理中的潜在问题,并以不同的方法重新开始,而这一切都不是通过显式训练实现的。

The full R1 model built on these insights by introducing what they call "cold-start" data— a small set of high-quality examples— before applying their RL techniques. They also solved one of the major challenges in reasoning models: language consistency. Previous attempts at chain-of-thought reasoning often resulted in models mixing languages or producing incoherent outputs. DeepSeek solved this through a clever language consistency reward during RL training, trading off a small performance hit for much more readable and consistent outputs.
完整的 R1 模型基于这些见解构建,并引入了他们称之为“冷启动”数据——一小组高质量示例——然后再应用他们的 RL 技术。他们还解决了推理模型中的一个主要挑战:语言一致性。先前的链式思维推理尝试经常导致模型混合语言或生成不连贯的输出。DeepSeek 通过在 RL 训练过程中引入巧妙的语言一致性奖励解决了这一问题,以轻微的性能损失换取更可读且更一致的输出。

The results are mind-boggling: on AIME 2024, one of the most challenging high school math competitions, R1 achieved 79.8% accuracy, matching OpenAI's O1 model. On MATH-500, it hit 97.3%, and it achieved the 96.3 percentile on Codeforces programming competitions. But perhaps most impressively, they managed to distill these capabilities down to much smaller models: their 14B parameter version outperforms many models several times its size, suggesting that reasoning ability isn't just about raw parameter count but about how you train the model to process information.
结果令人难以置信:在 AIME 2024 这一最具挑战性的高中数学竞赛之一中,R1 达到了 79.8% 的准确率,与 OpenAI 的 O1 模型相匹配。在 MATH-500 上,它达到了 97.3%,并在 Codeforces 编程竞赛中达到了 96.3 百分位。但或许最令人印象深刻的是,他们成功地将这些能力提炼到更小的模型中:其 140 亿参数版本的表现优于许多体积数倍于它的模型,这表明推理能力不仅仅取决于参数数量,还取决于如何训练模型来处理信息。

The Fallout  余波

The recent scuttlebutt on Twitter and Blind (a corporate rumor website) is that these models caught Meta completely off guard and that they perform better than the new Llama4 models which are still being trained. Apparently, the Llama project within Meta has attracted a lot of attention internally from high-ranking technical executives, and as a result they have something like 13 individuals working on the Llama stuff who each individually earn more per year in total compensation than the combined training cost for the DeepSeek-V3 models which outperform it. How do you explain that to Zuck with a straight face? How does Zuck keep smiling while shoveling multiple billions of dollars to Nvidia to buy 100k H100s when a better model was trained using just 2k H100s for a bit over $5mm?
最近在 Twitter 和 Blind(一个公司谣言网站)上的传闻是,这些模型让 Meta 完全措手不及,并且它们的表现优于仍在训练中的新 Llama4 模型。显然,Meta 内部的 Llama 项目已经引起了高级技术高管的极大关注,因此他们有大约 13 个人在研究 Llama 相关工作,而这些人的年总薪酬每人都超过了 DeepSeek-V3 模型的总训练成本,而后者的性能还优于 Llama4。你要如何面不改色地向 Zuck 解释这一点?当一个更好的模型仅用 2k H100 训练,成本略高于 500 万美元,而 Zuck 却在向 Nvidia 砸下数十亿美元购买 10 万块 H100 时,他是如何还能保持微笑的?

But you better believe that Meta and every other big AI lab is taking these DeepSeek models apart, studying every word in those technical reports and every line of the open source code they released, trying desperately to integrate these same tricks and optimizations into their own training and inference pipelines. So what's the impact of all that? Well, naively it sort of seems like the aggregate demand for training and inference compute should be divided by some big number. Maybe not by 45, but maybe by 25 or even 30? Because whatever you thought you needed before these model releases, it's now a lot less.
但你最好相信,Meta 和其他所有大型 AI 实验室都在拆解这些 DeepSeek 模型,研究技术报告中的每一个字,以及他们发布的开源代码中的每一行,拼命尝试将这些相同的技巧和优化整合到自己的训练和推理流程中。那这一切的影响是什么?从直觉上看,训练和推理计算的总需求似乎应该被某个大数除以一定比例。也许不是 45,但可能是 25,甚至 30?因为无论你之前认为自己需要多少计算资源,在这些模型发布之后,现在的需求已经少了很多。

Now, an optimist might say "You are talking about a mere constant of proportionality, a single multiple. When you're dealing with an exponential growth curve, that stuff gets washed out so quickly that it doesn't end up matter all that much." And there is some truth to that: if AI really is as transformational as I expect, if the real-world utility of this tech is measured in the trillions, if inference-time compute is the new scaling law of the land, if we are going to have armies of humanoid robots running around doing massive amounts of inference constantly, then maybe the growth curve is still so steep and extreme, and Nvidia has a big enough lead, that it will still work out.
现在,乐观主义者可能会说:“你只是在谈论一个简单的比例常数,一个单一的倍数。当你处理指数增长曲线时,这些东西会很快被冲淡,以至于最终并不会产生太大影响。” 这其中确实有一定道理:如果人工智能真的像我预期的那样具有变革性,如果这项技术的实际效用以万亿美元计,如果推理计算时间成为新的规模法则,如果我们将拥有成群的类人机器人不断进行大规模推理,那么也许增长曲线仍然会如此陡峭和极端,而 Nvidia 的领先优势足够大,以至于它仍然能够成功。

But Nvidia is pricing in a LOT of good news in the coming years for that valuation to make sense, and when you start layering all these things together into a total mosaic, it starts to make me at least feel extremely uneasy about spending ~20x the 2025 estimated sales for their shares. What happens if you even see a slight moderation in sales growth? What if it turns out to be 85% instead of over 100%? What if gross margins come in a bit from 75% to 70%— still ridiculously high for a semiconductor company?
但是,Nvidia 在未来几年已经计入了大量利好消息,以使该估值合理化,当你将所有这些因素整合在一起时,至少让我对以约 20 倍 2025 年预估销售额的价格购买其股票感到极度不安。如果销售增长率哪怕稍微放缓会发生什么?如果最终增长率是 85% 而不是超过 100% 呢?如果毛利率从 75% 降至 70%——对于一家半导体公司来说仍然高得离谱——又会怎样?

Wrapping it All Up 总结

At a high level, NVIDIA faces an unprecedented convergence of competitive threats that make its premium valuation increasingly difficult to justify at 20x forward sales and 75% gross margins. The company's supposed moats in hardware, software, and efficiency are all showing concerning cracks. The whole world— thousands of the smartest people on the planet, backed by untold billions of dollars of capital resources— are trying to assail them from every angle.
从高层次来看,NVIDIA 正面临前所未有的竞争威胁收敛,使其在 20 倍远期销售和 75% 毛利率下的高估值越来越难以合理化。公司在硬件、软件和效率方面所谓的护城河都显现出令人担忧的裂缝。全世界——成千上万最聪明的人才,在无数十亿美元资本资源的支持下——正从各个角度试图攻破它们。

On the hardware front, innovative architectures from Cerebras and Groq demonstrate that NVIDIA's interconnect advantage— a cornerstone of its data center dominance— can be circumvented through radical redesigns. Cerebras' wafer-scale chips and Groq's deterministic compute approach deliver compelling performance without needing NVIDIA's complex interconnect solutions. More traditionally, every major NVIDIA customer (Google, Amazon, Microsoft, Meta, Apple) is developing custom silicon that could chip away at high-margin data center revenue. These aren't experimental projects anymore— Amazon alone is building out massive infrastructure with over 400,000 custom chips for Anthropic.
在硬件方面,Cerebras 和 Groq 的创新架构表明,NVIDIA 的互连优势——其数据中心主导地位的基石——可以通过激进的重新设计来规避。Cerebras 的晶圆级芯片和 Groq 的确定性计算方法在无需 NVIDIA 复杂互连解决方案的情况下提供了强劲的性能。更传统的是,NVIDIA 的每个主要客户(Google、Amazon、Microsoft、Meta、Apple)都在开发定制芯片,这可能会侵蚀其高利润的数据中心收入。这些已不再是实验性项目——仅 Amazon 就正在为 Anthropic 构建超过 400,000 颗定制芯片的大规模基础设施。

The software moat appears equally vulnerable. New high-level frameworks like MLX, Triton, and JAX are abstracting away CUDA's importance, while efforts to improve AMD drivers could unlock much cheaper hardware alternatives. The trend toward higher-level abstractions mirrors how assembly language gave way to C/C++, suggesting CUDA's dominance may be more temporary than assumed. Most importantly, we're seeing the emergence of LLM-powered code translation that could automatically port CUDA code to run on any hardware target, potentially eliminating one of NVIDIA's strongest lock-in effects.
软件护城河同样显得脆弱。新的高级框架,如 MLX、Triton 和 JAX,正在削弱 CUDA 的重要性,而改进 AMD 驱动程序的努力可能会解锁更便宜的硬件替代方案。向更高级抽象发展的趋势类似于汇编语言让位于 C/C++,这表明 CUDA 的主导地位可能比预期的更短暂。最重要的是,我们正在看到 LLM 支持的代码翻译的出现,它可以自动移植 CUDA 代码以在任何硬件目标上运行,可能会消除 NVIDIA 最强大的锁定效应之一。

Perhaps most devastating is DeepSeek's recent efficiency breakthrough, achieving comparable model performance at approximately 1/45th the compute cost. This suggests the entire industry has been massively over-provisioning compute resources. Combined with the emergence of more efficient inference architectures through chain-of-thought models, the aggregate demand for compute could be significantly lower than current projections assume. The economics here are compelling: when DeepSeek can match GPT-4 level performance while charging 95% less for API calls, it suggests either NVIDIA's customers are burning cash unnecessarily or margins must come down dramatically.
也许最具破坏性的是 DeepSeek 最近的效率突破,以大约 1/45 的计算成本实现了可比的模型性能。这表明整个行业在计算资源上可能存在大规模的过度配置。结合通过链式思维模型出现的更高效的推理架构,总体计算需求可能显著低于当前预测的假设。这里的经济学令人信服:当 DeepSeek 能够以 95%更低的 API 调用费用匹配 GPT-4 级别的性能时,这表明要么 NVIDIA 的客户在不必要地烧钱,要么利润率必须大幅下降。

The fact that TSMC will manufacture competitive chips for any well-funded customer puts a natural ceiling on NVIDIA's architectural advantages. But more fundamentally, history shows that markets eventually find a way around artificial bottlenecks that generate super-normal profits. When layered together, these threats suggest NVIDIA faces a much rockier path to maintaining its current growth trajectory and margins than its valuation implies. With five distinct vectors of attack— architectural innovation, customer vertical integration, software abstraction, efficiency breakthroughs, and manufacturing democratization— the probability that at least one succeeds in meaningfully impacting NVIDIA's margins or growth rate seems high. At current valuations, the market isn't pricing in any of these risks.
台积电为任何资金充足的客户制造具有竞争力的芯片,这对 NVIDIA 的架构优势设定了一个天然上限。但从更根本的角度来看,历史表明,市场最终会找到绕过那些带来超常利润的人为瓶颈的方法。当这些威胁叠加在一起时,表明 NVIDIA 在维持当前增长轨迹和利润率方面面临的挑战比其估值所暗示的要严峻得多。凭借五个不同的攻击方向——架构创新、客户垂直整合、软件抽象、效率突破和制造民主化——至少有一个成功对 NVIDIA 的利润率或增长率产生重大影响的可能性似乎很高。而在当前估值下,市场并未将这些风险计入其中。

I hope you enjoyed reading this article. If you work at a hedge fund and are interested in consulting with me on NVDA or other AI-related stocks or investing themes, I'm already signed up as an expert on GLG and Coleman Research.
希望你喜欢阅读这篇文章。如果你在对冲基金工作,并对就 NVDA 或其他与人工智能相关的股票或投资主题与我咨询感兴趣,我已经在 GLG 和 Coleman Research 注册为专家。
But the vendors face another threat: Enterprises can access the same underlying models they do to build similar tools of their own. Kay said she built a tool that was able to replicate some of the capabilities of Copilot at a much lower cost.
但供应商面临另一个威胁:企业可以访问与他们相同的底层模型来构建类似的工具。Kay 表示,她构建了一个工具,能够以更低的成本复制 Copilot 的一些功能。

Amazon Web Services is in part betting on that strategy. Its Bedrock platform allows users to access models from companies like Anthropic, Meta Platforms and Mistral AI with either a no-commitment, pay-as-you-go pricing model starting at less than one cent per interaction or a time-based term commitment, starting at $25 per hour of commitment to use the Bedrock service. Although Amazon also provides its work assistant, Amazon Q, for $3 to $20 per user per month, depending on the tier.
亚马逊网络服务(Amazon Web Services)部分押注于该策略。其 Bedrock 平台允许用户以无承诺、按使用付费的定价模式(每次交互费用低至不到一美分)或基于时间的承诺模式(使用 Bedrock 服务的承诺费用起价为每小时 25 美元)访问来自 Anthropic、Meta Platforms 和 Mistral AI 等公司的模型。尽管亚马逊还提供其工作助手 Amazon Q,费用根据等级每位用户每月为 3 美元至 20 美元不等。

Software companies are also facing pressure to adapt their pricing to account for the fact that the actual cost of using the underlying models is going down. As that happens, CIOs don’t want to feel like their vendors are simply taking a bigger share of the profits.
软件公司还面临着调整定价的压力,以应对底层模型实际使用成本下降的事实。随着这种情况的发生,首席信息官们不希望感觉他们的供应商只是拿走了更大比例的利润。

“If they aren’t fair and equitable in how they price those tools and transactions, they’re actually going to incent me to build my own capability over buying theirs,” said Nationwide Chief Technology Officer Jim Fowler. “And so my biggest concern is in this rush to AI, that they price themselves out of the enterprise.” Vendors and enterprises alike are still working to figure things out, he said. “It’s still the wild west.”
“如果他们在定价这些工具和交易时不公平和不公正,他们实际上会激励我自己构建能力,而不是购买他们的产品,”Nationwide 首席技术官 Jim Fowler 说。“所以我最大的担忧是在这场 AI 热潮中,他们的定价会让自己被企业市场排除在外。”他说,供应商和企业都仍在努力找出解决办法。“这仍然是一个蛮荒之地。”

Salesforce says it’s targeting more flexibility when it comes to their pricing options. Last September the company rolled out a pricing plan that allowed enterprises to toggle their spend minimums between per-month licenses for human employees and consumption-based agents. 
Salesforce 表示,其定价选项将更加灵活。去年 9 月,该公司推出了一项定价计划,允许企业在按月为人类员工购买许可证和基于使用量的代理之间切换最低支出。

A lot of customers are still trying to make sure they have the right value equation, said Bill Patterson, executive vice president of Corporate Strategy at Salesforce, and for some of the AI investments companies have made over the last two years, the jury is still out.
许多客户仍在努力确保他们拥有正确的价值平衡,Salesforce 公司战略执行副总裁 Bill Patterson 表示,对于过去两年中公司在某些 AI 投资上的成果,目前仍未有定论。

Meanwhile vendors continue facing the dilemma of making tools cheap enough that people will buy them but expensive enough so they’re not losing money in compute costs if people use it too much – a balance that’s hard to navigate with tools that are so new.
与此同时,供应商仍然面临着一个两难境地:工具需要足够便宜以吸引人们购买,但又必须足够昂贵以避免因过度使用导致计算成本亏损——对于如此新颖的工具来说,这种平衡很难掌握。
Warning
决策艰难的商业模式都是不好的模式。
Earlier this year, OpenAI CEO Sam Altman posted on X that the $200 per month ChatGPT Pro plan was losing money because people were using it more than anticipated. (The ChatGPT Enterprise plan is separate and typically comes in at about $30 to $45 per seat, OpenAI said).
今年早些时候,OpenAI 首席执行官 Sam Altman 在 X 上发文称,每月 200 美元的 ChatGPT Pro 计划正在亏损,因为人们的使用量超出了预期。(OpenAI 表示,ChatGPT Enterprise 计划是单独的,通常每个席位约为 30 到 45 美元)。

Going forward, CIOs anticipate more changes and experimentation with different pricing strategies from their vendors.
展望未来,首席信息官们预计他们的供应商将在不同的定价策略上进行更多的变化和试验。

“We’re in such an interesting and fluid time, it’s hard to say which variant is going to win,” said Don Vu, chief data and analytics officer at New York Life.
“我们正处于一个如此有趣且多变的时期,很难说哪种变体会胜出,”纽约人寿首席数据与分析官 Don Vu 说道。
潘杨:对于今年DeepSeek引爆应用市场,后续的发展趋势,您是如何看待的?

季逸超:首先可以确定,AI在中国的热潮可能是DeepSeek带起来的,我觉得这非常好。之前在国内,大家一直没有一个非常好的开源大模型,但如果具体到agent和infra的话,其实还有需要探讨的方面。

第一点,DeepSeek的模型(无论V3还是R1)本身更侧重推理能力,在多模态、函数调用、长期规划等能力上并不出众。这可能是因为DeepSeek团队前期将资源集中于推理优化,对多模态采取了战略性后推策略。如果专注于智能体领域,可以借DeepSeek的东风,但需避免过度绑定其技术路线,需等待其多模态能力的进一步发展。

第二点,因DeepSeek的爆发,国内外对Infra的要求显著提升。从DeepSeek最近的V3论文看,其架构已与传统MA-like模型有显著差异,但除官方外,国内推理厂商的Infra优化普遍不足,仍需大量工作。若要将智能体与 Infra结合,2025年将是一个关键机遇。传统算力关注点主要在训练阶段,但智能体带来的24小时持续推理需求将彻底改变格局——交互时长延长导致Token消耗量剧增,且多轮对话中上下文不断累积,进一步推高资源需求。今年因DeepSeek母体模型的成熟,Infra有望迎来爆发。
Idea
NVDA的投资逻辑,包括几家云计算的公司。
Hard disk drive makers, under pressure from faster rivals, hope a breakthrough will keep it the dominant data-storage medium
在速度更快的竞争对手的压力下,硬盘驱动器制造商希望取得突破,以保持其数据存储介质的主导地位。

The fate of an industry is riding on a laser smaller than a grain of salt.
一个行业的命运取决于比一粒盐还小的激光。

Data-storage company Seagate STX 1.57%increase; green up pointing triangle developed this diminutive heat source to help it encode information in ever-greater quantities on the spinning magnetic platters of hard disk drives. Stacked by the thousands in data centers, the drives hold everything from home movies to medical records to factory log files.
数据存储公司希捷(Seagate STX1.57%)开发了这种微型热源,以帮助其在硬盘驱动器旋转的磁盘上进行更大量的信息编码。这些硬盘成千上万地堆放在数据中心中,存储着从家庭电影、医疗记录到工厂日志文件等各种信息。

Seagate’s innovation, heat-assisted magnetic recording, is critical to the future of the globe-spanning manufacturer. Its hard drives are competing against newer and faster technology in the business of storing the world’s information, and to survive, their capacity must continue to increase.
希捷的创新技术--热辅助磁记录技术,对这家全球制造商的未来至关重要。在存储全球信息的业务中,希捷硬盘正在与更新、更快的技术竞争,要想生存下去,就必须不断提高容量。

The company has started shipping a paperback-size drive that holds 36 terabytes of data—the equivalent of 1,400 Blu-ray movies. It has achieved nearly twice that in the lab, and its executives think far more is possible.
该公司已开始出货一种纸背大小的硬盘,可容纳 36 TB 的数据,相当于 1,400 部蓝光电影。该公司在实验室中已经实现了将近两倍的容量,而且公司高管认为容量还可能更大。

“We’ve always believed that hard drives had legs,” said Seagate Chief Executive Dave Mosley. “We’ve proven that by continuing to invest in them and see the returns.”
"希捷首席执行官戴夫-莫斯利(Dave Mosley)说:"我们一直坚信硬盘是有生命力的。"我们对硬盘的持续投资已经证明了这一点,并看到了回报"。

The new products arrive as AI is fueling a surge in demand for data storage. Data centers last year spent an estimated $40 billion on storage devices, according to the consulting firm IDC, and that is expected to grow by 31% over the next two years.
新产品的推出正值人工智能推动数据存储需求激增之际。咨询公司 IDC 的数据显示,去年数据中心在存储设备上的支出估计达 400 亿美元,预计未来两年将增长 31%。

Wall Street analysts predict that Seagate’s sales from fiscal 2024 to 2026 will increase by 55%, to $10 billion, while its earnings per share will grow by more than 650%.
华尔街分析师预测,希捷 2024 至 2026 财年的销售额将增长 55%,达到 100 亿美元,而每股收益将增长 650%以上。

IBM invented the hard disk drive in the 1950s, and the storage device has endured while other media such as punch cards, floppy disks and CD-ROMs fell into obscurity. The drives’ capacity grew as their price dropped, and today a one-terabyte consumer model, which can hold tens of thousands of high-resolution photos, costs less than $70.
20 世纪 50 年代,IBM 发明了硬盘驱动器,当打孔卡、软盘和 CD-ROM 等其他媒体逐渐销声匿迹时,这种存储设备却经久不衰。随着价格的下降,硬盘的容量也在不断增加,如今,一个可存储数万张高分辨率照片的一兆字节消费型硬盘的价格不到 70 美元。

But in the 1990s, hard drives’ most formidable challenger emerged. Solid-state drives store data as electrons, allowing them to read and write faster than hard drives.
但在 20 世纪 90 年代,硬盘驱动器最强大的挑战者出现了。固态硬盘以电子形式存储数据,读写速度比硬盘更快。

Solid-state drives are more expensive than hard drives on a per-terabyte basis, but the disparity has steadily declined. They have become the default in personal computers, and some in the industry say it won’t be long until they take over data centers too.
按每 TB 计算,固态硬盘比硬盘更贵,但这种差距在稳步缩小。固态硬盘已成为个人电脑的默认设置,一些业内人士表示,不久的将来,固态硬盘也会占领数据中心。

John Colgrove, founder and chief visionary officer of Pure Storage, a company that designs storage systems, said it is now shipping 150-terabyte solid-state drives called DirectFlash.
设计存储系统的 Pure Storage 公司创始人兼首席执行官 John Colgrove 说,该公司目前正在出货名为 DirectFlash 的 150 太字节固态硬盘。

Their capacity will quadruple in the next two years, he said, and that growth, coupled with what he called solid state’s lower demand for power, will quickly erode hard drives’ cost advantage.
他说,在未来两年内,固态硬盘的容量将翻两番,这种增长加上他所说的固态硬盘对电力的需求降低,将迅速削弱硬盘的成本优势。

“The debate isn’t will hard drives go away—it’s when will they go away,” Colgrove said."
科尔格罗夫说:"争论的焦点不是硬盘是否会消失,而是硬盘何时会消失。

Hard drives write by flipping the magnetic orientation of tiny “bits” up or down. That action denotes them as a one or a zero, the binary code that makes up the digital language.
硬盘通过上下翻转微小 "比特 "的磁性方向进行写入。这一动作将它们表示为 1 或 0,即构成数字语言的二进制代码。

Seagate and other manufacturers have managed to make those bits smaller and smaller, but in conventional hard drives, they are approaching a limit beyond which they would become too unstable to control.
希捷和其他制造商已设法将这些比特变得越来越小,但在传统硬盘中,它们正在接近一个极限,超过这个极限,它们将变得过于不稳定,难以控制。

Enter heat-assisted magnetic recording, or HAMR, a technology Seagate has been developing for more than 20 years. The new drives use a laser to apply a nanosecond of heat to bits smaller than any used before, allowing them to be magnetically manipulated.
热辅助磁记录(或 HAMR)是希捷 20 多年来一直在开发的一项技术。这种新型硬盘使用激光对比起以前使用的任何硬盘都要小的比特进行纳秒级加热,使其能够进行磁处理。



    热门主题


      • Related Articles

      • 2025-02-26 NVIDIA Corporation (NVDA) Q4 2025 Earnings Call Transcript

        NVIDIA Corporation (NASDAQ:NVDA) Q4 2025 Earnings Conference Call February 26, 2025 5:00 PM ET Company Participants Stewart Stecker - IR Jensen Huang - President and CEO Colette Kress - EVP and CFO Conference Call Participants C.J. Muse - Cantor ...
      • 2024-08-28 NVIDIA Corporation (NVDA) Q2 2025 Earnings Call Transcript

        NVIDIA Corporation (NASDAQ:NVDA) Q2 2025 Earnings Conference Call August 28, 2024 5:00 PM ET 英伟达公司(纳斯达克:NVDA)2025 年第二季度财报电话会议 2024 年 8 月 28 日 下午 5:00(东部时间) Company Participants 公司参与者 Stewart Stecker - Investor Relations 斯图尔特·斯特克 - 投资者关系 Colette Kress ...
      • 2024-03-19 NVIDIA Corporation (NVDA) GTC Financial Analyst Q&A - (Transcript)

        NVIDIA Corporation (NASDAQ:NVDA) GTC Financial Analyst Q&A Call March 19, 2024 11:30 AM ET 英伟达公司(纳斯达克:NVDA)GTC 财务分析师问答电话会议 2024 年 3 月 19 日上午 11:30 Company Participants 公司参与者 Jensen Huang - Founder and Chief Executive Officer 黄仁勋 - 创始人兼首席执行官 Colette ...
      • 2024-02-21 NVIDIA Corporation (NVDA) Q4 2024 Earnings Call Transcript

        NVIDIA Corporation (NASDAQ:NVDA) Q4 2024 Earnings Conference Call February 21, 2024 5:00 PM ET 英伟达公司(纳斯达克股票代码:NVDA)2024 年第四季度收益电话会议 2024 年 2 月 21 日 下午 5:00 美东时间 Company Participants 公司参与者 Simona Jankowski - VP, IR 西蒙娜·扬科夫斯基 - 副总裁,投资者关系 Colette Kress ...
      • 2024-11-20 NVIDIA Corporation (NVDA) Q3 2025 Earnings Call Transcript

        NVIDIA Corporation (NASDAQ:NVDA) Q3 2025 Earnings Conference Call November 20, 2024 5:00 PM ET 英伟达公司(纳斯达克:NVDA)2025 年第三季度收益电话会议 2024 年 11 月 20 日 下午 5:00 ET Company Participants 公司参与者 Stewart Stecker - Investor Relations Stewart Stecker - 投资者关系 ...