I.H.185.Warren Buffett.The “aha” moment

I.H.185.Warren Buffett.The “aha” moment

Benjamin Graham把价格投资的工作拆解为两个部分:Think Correctly+Think Independently,这跟人工智能的最新研究一模一样。
(1)“aha” moment、Grokking(顿悟)、Representation learning(表征学习)、Generalization(泛化)、Compression(压缩)
Think Correctly的目标是产生“aha” moment,人工智能在这方面已经有很深入的认识,田渊栋的最新研究用数学的方式解释了Grokking(顿悟),Grokking 几乎等同于“aha” moment,

Ilya Sutskever,都重点解释了这个问题,泛化不是神秘的东西,而是数据量达到某个临界点后,几乎必然会发生的相变,参考:《2025-10-30 田渊栋.AI“顿悟”的关键,是对优雅的追求?》
(2)Value Functions,对“美感”或“优雅”(Elegance)的隐性偏好
对应Think Independently,这种能力是跟安全感绑定的,在Ilya Sutskever讲的Value Functions指的就是这个部分:《2025-11-26 Ilya Sutskever.We're moving from the age of scaling to the age of research》

1、《2001-04-28 Berkshire Hathaway Annual Meeting》

WARREN BUFFETT: Yeah, one every year or two. And sometimes there’ll be a bunch of them, like in 1973 and -4. But the problem is, for us is that big, now, really means big. I mean, it has to be billions of dollars to move the needle very much at Berkshire.
沃伦·巴菲特: 是的,大概每一到两年一个。有时会有一系列的大想法,比如在 1973 和 1974 年。但对我们来说,问题是现在“大”真的意味着非常大。我的意思是,它必须达到数十亿美元才能显著影响伯克希尔的表现。

But I would say that when I would turn those pages, 50 years ago in the Moody’s Manuals, I would know when I hit a big idea. I’ve got half a dozen of them that I keep the Xeroxes from those reports around from 50 years ago just because it was so obvious that they just — they were incredible. And that happens every now and then.
但我想说的是,50 年前,当我翻阅《穆迪手册》的页面时,我知道自己遇到了一个“大想法”。有六七个这样的想法,我还保留着那些报告的复印件,就因为它们当时显而易见——它们太了不起了。而这种情况偶尔会发生。

When I met Lorimer Davidson, you know, in end of January, 1951, and he spent four hours or five hours with me explaining GEICO, I knew it was a big idea.
1951 年 1 月底,当我见到洛里默·戴维森时,他花了四五个小时向我解释 GEICO,我知道那是一个“大想法”。

Eight months later, no, probably 10 months later, I wrote an article for The Commercial and Financial Chronicle on “The Security I Like Best.” It was a big idea.
八个月后,不,可能是十个月后,我为《商业与金融年鉴》写了一篇题为《The Security I Like Best》的文章。那是一个“大想法”。

When I found Western Insurance Securities, I knew it was a big idea.
当我发现 Western Insurance Securities 时,我知道那是一个“大想法”。

I couldn’t put billion — millions — of dollars into it, but I didn’t have millions so it didn’t make any difference.
我当时无法投入数百万或数十亿美元,但因为我也没有这么多钱,所以这并不重要。

And I — we’ve seen things subsequently. And we’ll see, you know. If we have a normal lifespan, we’ll see a few more before we get done, but I can’t tell you that —exactly how —
而且我们后来还看到了一些事情。我们会继续看到,如果我们有正常的寿命,在结束之前我们可能还会看到几个,但我无法确切告诉你——具体如何——

I can’t tell you exactly what transpires in my mind that says, you know, flashes a neon sign up that says, “This is a big idea.”
我无法确切告诉你,在我的脑海里发生了什么,会让我看到“This is a big idea”的霓虹灯在闪烁。
Idea
“aha” moment就是人工智能的泛化(Generalization),并不神秘甚至可以用数学表达其中的原理,参考:《2025-10-30 田渊栋.AI“顿悟”的关键,是对优雅的追求?》
What happens with you, Charlie?
查理,你会怎么判断?

(Laughter)
(笑声)

Actually, one of my — I’ve have a real system. (Laughter)
事实上,我有一个真正的系统。(笑声)

My idea of a truly big idea is, one I get it and I call Charlie and he only says no, rather than, “That’s the worst idea I’ve ever heard of.” But if he just says no, it’s a hell of an idea.
我对“大想法”的定义是,当我想到一个想法并打电话给查理时,他只说“不”,而不是说“这是我听过最糟糕的主意”。但如果他只是说“不”,那就说明这是一个了不起的想法。
The problem with insights is that they are like skills: you can't use language to transfer them from person to person. I can tell you how to operate a video recorder, because that is knowledge. But even if I knew how to do it myself, I couldn't tell you how to be a better gymnast because that is a skill. Skills can be learned only through personal experience, and insights are much the same. Where new knowledge is simply added to one's existing mental store, insights bring understanding, and understanding changes one's whole being. (p. 9)
洞见的问题在于,它们和技能一样:无法通过语言在人与人之间传递。我可以告诉你如何操作录像机,因为那是知识;但即使我自己会做,也无法教你如何成为更好的体操运动员,因为那是一种技能。技能只能通过个人经验习得,洞见亦然。新知识只是被添加到既有的心智储备中,而洞见则带来理解,理解会改变一个人的整体存在。(第9页)
Idea
有高度压缩的部分,语言不见得能很好的表达。
First of all, if you are expecting to find, through reading this book, that someone has actually created life, you will be disappointed. Second, I lost my notes taken while reading this book, and I finished the book three weeks before writing this review. Hence, my review will be less comprehensive, and perhaps less insightful, than originally intended.
首先,如果你期望通过阅读本书发现有人真的创造出了生命,你会失望。其次,我把阅读此书时做的笔记弄丢了,而且在撰写这篇书评前三周就已读完,因此,这篇评论会比最初设想的少一些全面性,也可能少一些洞见。

Grand critiques what artificial intelligence (AI) and computer programmers have done to date in their methods and general approaches for creating artificial life. What they have done is to simulate life using a top-down approach rather than the approach that got life to where it is today--bottom-up with evolution driving the change.
格兰德批评了人工智能(AI)研究者和程序员迄今在创造人工生命的方法与一般路径上所做的工作。他们所做的是用自上而下的方式来模拟生命,而非让生命走到今日境地的那种方式——自下而上,由进化推动变化。

This is the way organisms work. There is no architect, and no master controller telling the system what to do. There are just vast numbers of small independent entities that respond to signals as and when it suits them, and emit new signals whose destination they do not know. Top-down control leads to complexity explosions, because something somewhere has to be in charge of the whole system, and how much this master controller needs to know increases exponentially with the number of components in the system. Living systems are bottom-up: no part knows or cares what its role is in the whole, but the whole still emerges from the cacophony of these zillions of mindless loops of cause and effect. (p. 119)
这就是生物体的运作方式。没有建筑师,也没有主控者告诉系统要做什么。只有大量独立的小实体在合适的时候响应信号,并且发出它们自己也不知道去向的新信号。自上而下的控制会导致复杂性爆炸,因为系统里必须有某个地方的某个东西负责整体,而这个主控者需要了解的信息量会随着系统组件数量呈指数增长。活的系统是自下而上的:没有任何部分知道或在乎自己在整体中的角色,但整体仍然从这些无数无意识的因果循环的喧嚣中涌现出来。(第119页)

Grand's points are well made and well taken, but a critique of others doesn't necessarily mean that he can create life any better than the other attempts that have been made. He emphasizes that life isn't just cause and effect, although cause and effect are certainly a big part of it, and life couldn't happen without such properties. One of the key elements that is missing from other AI attempts are the emergent properties, the things that spring out of a complex system of causes and effects.
格兰德的论点提出得很好,也值得认可,但批评他人并不一定意味着他在创造生命上的成就能比其他尝试更好。他强调生命不只是因果关系,尽管因果关系确实是生命的重要组成部分,没有这种属性生命便无法发生。其他人工智能尝试缺失的关键元素之一就是涌现特性——那些从复杂因果系统中自发出现的东西。

Life is not the stuff of which it is made -- it is an emergent property of the aggregate arrangement of that stuff. Even the stuff itself is no more than an emergent property of a still smaller whirlpool of interactions. Living beings are high-order persistent phenomena, which endure through intelligent interaction with their environment. This intelligence is a product of multiple layers of feedback. An organism is therefore a localized network of feedback loops that ensures its own continuation. (p. 146)
生命并非其物质成分本身——它是那些成分聚合排列后涌现出的性质。甚至连这些成分本身,也不过是更小尺度相互作用漩涡的涌现属性。生物是高阶持久现象,通过与环境的智能交互得以延续。这种智能是多重反馈层的产物。因此,生物体是一个局部化的反馈环网络,确保自身的持续存在。(第146页)

When Grand finally sets out to explain his computer program--a program which I had never heard of, let alone experienced, before--I was more than a little bit disappointed. It sounded much more like computer code able to reproduce itself when someone was playing the game than artificial life, let alone life. There is no thinking by the creatures, no emergent properties, and even their appearance and their environment are totally made from the top-down rather than bottom-up approach. The theory sounded a whole lot better than the demonstration of it. Perhaps he takes a step forward from other video games, but it is a baby step, and not the leap the title of the book suggests.
当格兰德终于开始解释他的电脑程序——一个我此前闻所未闻、更别说亲身体验的程序——我不免颇感失望。它听起来更像是能在游戏过程中自我复制的计算机代码,而非人工生命,更遑论真实生命。那些生物并不会思考,没有涌现特性,甚至它们的外观与环境都是完全自上而下构造,而非自下而上。理论听起来远比演示更精彩。也许相比其他电子游戏他迈出了一步,但那只是小小一步,而非书名暗示的飞跃。

There is a lot of good stuff in this book, and hopefully others will run with the ideas more than Grand has. I'll wrap this up with a couple of the better quotes found within the pages. The second is Grand's admission that his bark is more powerful than his bite. May his future endeavors deliver more of the bite. I hope to see the real deal in my lifetime.
书中包含许多精彩内容,希冀他人能比格兰德更充分地发展这些理念。我用书里两句较好的引语作结。第二句是格兰德承认他的吠声比咬劲更猛。愿他未来的努力带来更多“咬劲”。希望我能在有生之年见到真正的成果。

Instead of 'command and control', we need to 'nudge and cajole'. Whether you run a school, run a country, manage an ecosystem or write computer software it makes no difference: complex adaptive systems cannot be dictated to -- you have to learn how to go with the flow and nudge individual components in order to encourage the system to go in the direction you want it to. (p. 149)
我们需要的不是“命令与控制”,而是“引导与劝诱”。无论你是办学校、治国家、管理生态系统还是编写计算机软件,都一样:复杂适应系统无法被命令支配——你必须学会顺势而行,轻推单个组件,以鼓励系统朝你希望的方向前进。(第149页)

I wish I could tell you what a mind is, and how to construct one, but as yet I cannot. I can only give you some clues about where such a phenomenon may have come from. (p. 215)
我希望能告诉你心智是什么,以及如何构建一个心智,但目前我做不到。我只能给你一些线索,关于这种现象可能源自何处。(第215页)
WARREN BUFFETT: Well, I think you should read everything you can.
沃伦·巴菲特:嗯,我认为你应该尽可能多地阅读。

I can tell you in my own case, I think by the time I was — well, I know by the time I was ten — I’d read every book in the Omaha Public Library that had anything to do with investing, and many of them I’d read twice.
就我个人而言,我可以告诉你,我想在我十岁的时候——我知道在我十岁的时候——我已经读完了奥马哈公共图书馆里所有与投资有关的书籍,其中许多我都读了两遍。

So I don’t think there’s anything like reading, and not just as limited to investing at all. But you’ve just got to fill up your mind with various competing thoughts and sort them out as to what really makes sense over time.
所以我认为没有什么能比得上阅读,这不仅仅局限于投资。你需要用各种相互竞争的想法来充实你的头脑,并随着时间的推移整理出真正有意义的东西。

And then once you’ve done a lot of that, I think you have to jump in the water, because investing on paper and doing — you know, and investing with real money, you know, is like the difference between reading a romance novel and doing something else. (Laughter)
然后,当你做了很多这样的事情后,我认为你必须跳入水中,因为在纸上投资和用真钱投资,就像读浪漫小说和做其他事情之间的区别。(笑声)
Idea
还能够联系现实的有意义,已经脱离现实、脱离了很多时间的没有意义,巴菲特把人性琢磨的非常清楚。
There is nothing like actually having a little experience in investing. And you soon find out whether you like it. If you like it, if it turns you on, you know, you’re probably going to do well on it.
没有什么能比真正拥有一点投资经验更好了。你很快就会发现自己是否喜欢它。如果你喜欢它,如果它让你感到兴奋,你知道,你可能会在这方面做得很好。

And the earlier you start, the better, in terms of reading. But, you know, I read a book at age 19 that formed my framework for thinking about investments ever since.
而且在阅读方面,越早开始越好。不过,你知道,我在 19 岁时读了一本书,从那时起就形成了我对投资的思维框架。

I mean, what I’m doing today at 76 is running things through the same thought pattern that I got from a book I read when I was 19.
我的意思是,我今天在 76 岁时所做的事情,是通过我 19 岁时读的一本书中获得的相同思维模式来进行的。

And I read all the other books, too, but if you — and you have to read a lot of them to know which ones really do jump out at you and which ideas jump out at you over time.
我也读了所有其他的书,但如果你——你必须读很多书才能知道哪些书真正吸引你,哪些想法随着时间的推移会吸引你。

So I would say that read and then, on a small scale in a way that can’t hurt you financially, do some of it yourself.
所以我会说,先阅读,然后在不会对你造成经济损失的小范围内,自己尝试做一些。
Ilya Sutskever:
伊利亚·苏茨克维尔:

I mean all those things are a source of pride for sure. I'm very grateful for having done all those things and it was very fun to do them. My current view is that happiness comes to a very large degree from the way we look at things.
所有这些事情当然是自豪的来源。我非常感激能够完成这些事情,而且做它们的过程也非常有趣。我目前的看法是,幸福在很大程度上来源于我们看待事物的方式。

You can have a simple meal and be quite happy as a result, or you can talk to someone and be happy as a result as well. Or conversely, you can have a meal and be disappointed that the meal wasn't a better meal.
你可以吃一顿简单的饭就感到很幸福,也可以通过和某人交谈而感到幸福。相反地,你也可能因为饭菜不够好而感到失望。

So I think a lot of happiness comes from that. But I'm not sure. I don't want to be too confident.
所以我认为,很多幸福源于此。但我不确定,我不想过于自信。
Warning
有点麻烦,想的还不够透,或许是因为经历还不够。

5、《2022-05-02 Berkshire Hathaway Annual Meeting》

WARREN BUFFETT: And you can say, why would it take guys that long to learn? And — well, we’ve got a few minutes before lunch. We should — let’s address that problem. Because I did bring something along on that. I started buying stocks when I was 11. I’d been reading every book in the library on it. I loved it. My dad loved — you know, it was his business and I’d get to go down to his office and I’d read the books down there. And I saved the money, and finally, by the time I was 11, I could buy a stock. And I could tell you, at that time — I went to New York Stock Exchange when I was nine. My dad took us to New York — each kid to New York once — and he took me, and I went to the New York Stock Exchange, and I was in awe of it. I could tell you how the specialist system worked, and the odd lot arrangements, and I could tell you the history of finance, and all of these things.
沃伦·巴菲特:你可能会说,怎么会花他们那么长时间才学会这些呢?好吧,在午饭前我们还有几分钟,我们就来谈谈这个问题。因为我确实带了一点东西来讲这个。我从 11 岁开始买股票,在那之前我把图书馆里所有相关的书都读了一遍。我特别喜欢这些东西,我爸爸也喜欢——你知道,那是他的生意,我可以跟他一起去办公室,在那儿看那些书。我攒钱,终于到了 11 岁的时候,我可以买一只股票了。那个时候我就已经能告诉你——我 9 岁的时候就去过纽约证券交易所。我爸爸会带我们每个孩子去一次纽约,他带我去,我就去了纽交所,那地方让我无比敬畏。那时候我能告诉你专营商制度是怎么运作的,零股买卖的安排是怎样的,我能讲出金融史,以及所有这些东西。

And then I got very interested in technical analysis, and charted stocks, and did all kinds of crazy things. Hours and hours and hours. And saved money to buy other stocks. And tried shorting. And I just did everything. And then, when I was either 19 or 20, and I can’t remember exactly where I did it or something, I picked up a book someplace. It wasn’t a textbook at school, but it was in Lincoln, Nebraska. And I — you know, I looked at this book, and I saw one paragraph, and it told me I’d been doing everything wrong. (Laughs) I just had the whole approach wrong. I thought I was in the business of trying to pick stocks that would go up. And, in one paragraph, I saw that that was totally foolish. And I’ve brought something that is really interesting. Let’s put up — what did we call this chart? Oh, here we are, yeah. Let’s put up [slide] illusion one. Done. There we have it.
后来我对技术分析特别着迷,给股票画走势图,干了各种稀奇古怪的事,一干就是好几个小时、好几个小时。我攒钱买别的股票,还去做空,我什么都试过。然后,当我 19 岁或 20 岁的时候(我已经记不清具体是在哪儿了),在某个地方拿起了一本书。那不是学校的教科书,是我在内布拉斯加州林肯市看到的。我翻这本书的时候,看到其中一个段落,它告诉我:我以前做的一切全都错了(笑)。我的整个思路都错了。我以为自己的工作是去挑那些会涨的股票。而就在那一个段落里,我意识到这个想法完全是愚蠢的。我今天带来了一样非常有意思的东西。我们把——这个图我们叫啥来着?哦,在这儿。来,把第一张幻灯片放出来,叫“错觉一”。好了,出来了。
Idea
通常在错误的路径上回不了头的,除非本来就不是这样的人。


You know, now if you look at that, some people will see two faces, some people will see a vase, and some people will look a long time and only see two faces. But the mind flips from one side to another, and there’s some name for it that — they call it “ambiguous illusions” or something of the sort. There’s other things that talk about aha moments — or, in the old comic strips with Popeye, Wimpy would have a little balloon over his head, and the lightbulb would go on. There’s this point where all of a sudden you see something you haven’t seen. Well, it took me — I had an illusion that I was looking at, we’ll say in that one, two faces. Go to the — let’s go to the one labeled two. And if you’re looking at it from one side, it looks like a rabbit, and if you look the other way, it looks like you’re looking at a duck. And, you know, the mind is a very funny place.
你看,现在如果你看这张图,有些人会看到两张脸,有些人会看到一个花瓶,还有些人看了很久也只能看到两张脸。但你的大脑会在两种图像之间来回切换,这种东西有个学名——他们叫“ambiguous illusions”之类的。还有一些说法会提到所谓的“啊哈时刻”(aha moment)——就像以前大力水手的连环漫画里,Wimpy头上会飘出一个小对话框,里面亮起一只灯泡。就是那种:突然之间,你看到了之前没看见的东西。对我来说也是这样。我原来以为自己看到的,就像第一张图那样,是两张脸。再看下一张——我们看编号为“二”的那张。如果你从一个角度看,它像一只兔子;如果换个角度看,它又像一只鸭子。你知道,人脑真是个很有趣的东西。

And I think people call it an apperceptive mass, when you have all kinds of things going on in your mind. And they go on for years, and they sit there and get lost (laughs). And then, all of a sudden, you see something different than what you were seeing before. Now, it took me, in stocks, which I was intensely interested in, and I had a decent IQ, and I was reading and thinking, you know. And it was important to me to make some money on it. I had every motivation in the world. And then I read a chapter — I read a paragraph, actually — in chapter 8, I think it was, of the Intelligent Investor, and it told me that I wasn’t looking at the duck, I was looking, you know — now it was the rabbit — whatever it may be. And whether you call it a lightbulb” — whether you call it, you know, a moment of truth — whatever it may be — and that happened to me in Lincoln. I mean, it changed my life.
我想人们把这种情况叫作“统觉团”(apperceptive mass),就是说你脑子里同时装着各种各样的东西,它们在那儿呆了很多年,搁在那里、被埋没在那里(笑)。然后某个瞬间,你看到的东西忽然和以前不一样了。对股票这件事来说,我对它极度感兴趣,我的智商也还算过得去,我一直在读书、在思考,而且赚到钱对我来说非常重要,我拥有世上所有的动机。然后我在《聪明的投资者》第 8 章里读到了一个段落(准确地说就是一个段落),它告诉我:我之前看的不是“鸭子”,而是——现在应该说是“兔子”——不管你怎么称呼它。你可以叫那一刻是灯泡亮起,也可以叫是真相大白的瞬间——随便你怎么叫——这样的事就发生在林肯市,对我来说,那改变了我的一生。
Idea
原来认为是看不懂,apperceptive masses的理论认为看不到,正确的意识被错误意识全面压制,放在眼面前也是看不到的。
Quote
统觉团是心理学家Theodor Lipps提出的概念,赫尔巴特认为人类大脑中存在大量思想意识,思想意识之间相互争夺主导地位,类似的思想意识汇聚到一起形成思想意识的统觉团,处于主导地位的统觉团排斥与其不相容的思想意识,将其抑制于潜意识,处于潜意识中的思想意识等待与其类似的思想意识进入大脑形成自己的统觉团,伺机争夺主导地位。

If I hadn’t read that book, I don’t know how long I would have gone on looking for head-andshoulders formations, and 200-day moving averages, and the odd lot ratios, and a zillion things. And I love that kind of stuff. Except it was the wrong stuff I was looking at. And I’ve had that happen — and Charlie’s had it happen, I’m sure. It happens a few times in your life. And all of a sudden, you see something important that, why in the hell didn’t I see this in the first place? Maybe it’s a week ago, maybe it’s a year ago, maybe it’s five years ago. Maybe it’s learning how to get along with people, you know. I mean, whether, actually, it’s better to be, you know, kind, or not, you know. Or whether —I mean, they’re just — learning how — if you want the world to love you, what you have to do, or whatever. You know it when you see it, but you didn’t see it for ten years before.
如果我当初没有读到那本书,我都不知道自己还会在“头肩形态”、200 日移动平均线、零股比率,以及一大堆类似东西上折腾多久。而且我还特别喜欢这些东西。只是,我看的是一套完全错误的东西。这种情况我遇到过,Charlie 肯定也遇到过。你一生中会有那么几次,突然看到一件很重要的事情,然后心里纳闷:我怎么一开始就没看出来呢?也许这件事一周前就在那里,也许是一年前、五年前。也许是关于如何跟人相处,你知道的——到底做一个善良的人是不是更好,之类的。又或者——反正就是——弄明白:如果你想要这个世界喜欢你,你得做些什么,等等。当你终于看懂的时候,你一眼就知道那是真的,但在那以前的十年里,你一直没看见。
Idea
《富爸爸穷爸爸》有完全相同的描述,如果能看到就有机会反复看到。
Quote
你们会看到别人看不见的东西。机会就摆在人们面前,但大多数人从来看不到这些机会,因为他们忙着追求金钱和安定,所以只能得到这些。如果你们能看到一个机会,就注定你们会在一生中不断地发现机会。

And I don’t know whether Charlie’s got some thoughts on that or not, but that’s happened in a few situations in business, where I’ve looked at a company for a decade. And then there’s something that just all gets rearranged in your mind, and, you know, you can say, well, why didn’t I see this five years ago? But I’ve had it happen a few times, obviously — and everybody here has — just in different areas of their lives. And you think, how could I have been so stupid? Well, that’s what Charlie’s — when he was in the law practice, he had a partner, Roy Tolles. And every smart guy that would get in trouble — usually it was guys, and usually it was with women — and, you know, they’d come into the office, and they’d look, you know, down-faced and everything. And they’d say, it seemed like a good idea at the time, you know. I mean — (Laughter) And their lives unraveled, you know, in many cases.
我不知道 Charlie 在这方面有没有什么想法,但在商业世界里,这种事在我身上发生过好几次。我有时会盯着一家公司看上十年。然后有一天,你脑子里的那些东西突然被重新排好队,你这才会问自己:我五年前怎么没看出来呢?这事在我身上发生过几回,显然,在座每个人身上也都发生过,只不过发生在他们人生中不同的领域而已。然后你会想:我以前怎么会这么蠢?Charlie 当律师时,有个合伙人叫 Roy Tolles。每次有聪明人惹了麻烦——通常是男人,通常和女人有关——他们就会走进律师事务所,一脸沮丧地坐在那里,然后说一句:当时看上去真是个好主意啊,你懂的(笑声)。而他们的人生,在很多情况下,也就跟着慢慢瓦解了。
Idea
这个描述跟田渊栋的讲法一模一样,从“记忆式拟合”跃迁到“结构化泛化”的那一刻,是表征学习的一次重组,而不是简单多记了几条经验,参考:《2025-10-30 田渊栋.AI“顿悟”的关键,是对优雅的追求?》
So, there is that apperceptive mass that’s sitting in there inside somehow, and every now and then it produces some insight. It’s better, actually, if it produces insight into your behavior than whether it produces insight to make money. And some people never get it. And they wonder why — you know, whether their kids hate them, or whether there’s nobody in the world that would give a damn whether they live or die. In fact, they prefer they die because they’ve been courting them for their art collection, or whatever it may be. It’s just — Charlie would say, you know, just write your obituary and reverse engineer it. And not a crazy idea, but, Charlie, I don’t know. What do you know about apperceptive masses? Which are (laughs) you know, optical illusions.
所以,在我们心里,总有那么一个“统觉团”待在那里,时不时会冒出一点洞见。老实说,如果它能让你对自己的行为有洞见,比让你学会怎么赚钱要更重要。有些人一辈子都得不到这种洞见,然后他们纳闷——为什么孩子恨他们,为什么世上根本没有人在乎他们是活是死。事实上,有些人甚至巴不得他们早点死,好早点继承他们的艺术品收藏之类的东西。这就是为什么 Charlie 会说:你先把自己的讣告写出来,然后倒推你该怎么活。这主意一点也不疯狂。不过,Charlie,我不知道,你对这些“统觉团”怎么看?这种东西其实就是(笑)类似视觉错觉。

CHARLIE MUNGER: Well, I know that that’s the way the brain works. And that it’s easy to get it wrong. And part of the trick is to get so you correct your own mistakes. And we’ve done a lot of that.
CHARLIE MUNGER:嗯,我知道大脑就是这么工作的,而且非常容易把事情看错。诀窍的一部分,就是要学会自己纠正自己的错误,而我们在这方面确实做了很多。

WARREN BUFFETT: Yeah.
WARREN BUFFETT:是的。

CHARLIE MUNGER: Frequently way too late.
CHARLIE MUNGER:而且往往是晚得离谱。

WARREN BUFFETT: Yeah. We’ve done better with the mistakes than we have with the good — the reasonably good ideas.
WARREN BUFFETT:是的。我们从错误中得到的收获,要比从那些不错——或者说还算不错——的点子中得到的收获多得多。
I remember when I was in my early twenties, I read a book by Tom Peters. He was a big management guru in the eighties. In that book, he was giving the example of two gas stations in California that were diagonal on a busy intersection from each other. Both the gas stations were self-service stations. You come in, you pump your gas, and you leave. The owner would come out maybe once an hour, pick a random car, wash the windshield, or check the oil; just some extra service at no charge. The guy who was diagonal across the street was seeing this take place. He said to himself, "Well, that is kind of stupid. You cannot do it for everyone. If you did it for everyone, you would lose your shirt because you are not charging for it. He never copied or cloned that.
我记得在我二十出头的时候,我读过汤姆-彼得斯(Tom Peters)写的一本书。他是八十年代的管理大师。在那本书里,他举了一个例子:加利福尼亚州有两个加油站,分别位于一个繁忙十字路口的对角线上。这两个加油站都是自助加油站。你进来,加油,然后离开。店主可能一小时出来一次,随便挑一辆车,洗一下挡风玻璃,或者检查一下机油;只是一些额外的服务,不收取任何费用。斜对面的那个人看到了这一幕。他对自己说:"嗯,这有点愚蠢。你不能为每个人都这样做。如果你为每个人都这样做,否则你会赔得血本无归,因为你没有收费。他从来没有复制或克隆过。

Over time, what happened is that the gas station that was providing this random extra service saw an increase in business, and the one diagonal from him saw a decrease. Even after seeing the decrease, the guy across the street did not change his behavior. Tom Peter said, and this is what I found very unbelievable, that you can go to your most direct competitors and you can sit down with them, and you can give them all your trade scenes, everything that you learned that has given you an advantage, and they will listen to you, but there will be no behavior change. When I read that, I said, "This is ridiculous. This cannot be the way the world works." I am in my early twenties, I haven't kind of experienced life, and I don't know kind of how things work, but I made a promise to myself that I was going to prove Tom Peters wrong, and I was going to prove him wrong two ways. One, I was going to look for instances where humans see something smart happening and copy or clone it, because that would prove him wrong, and the second is, whenever I see someone doing something smart, I am going to copy it because that also proves him wrong. From my early twenties till now, this year, I am going to be 60, what I found, because I became a student of this, is that Tom Peters was mostly right. I still do not know why this is the case, but humans have an aversion to cloning. They somehow consider it beneath themselves that they did not come up with the idea. What I also found is that when I forced myself to copy things that I found to be smart, it gave me a big edge. This was an example of a simple idea. What I found is that there was a very small sliver of humans who were master cloners, and these humans owned the world. They did very well.
随着时间的推移,提供这种随机额外服务的加油站的生意越来越好,而他斜对面的加油站的生意却越来越差。即使看到生意减少,街对面的那个人也没有改变他的行为。汤姆-彼得说,这是我觉得非常不可思议的地方,你可以去找你最直接的竞争对手,你可以和他们坐下来,把你所有的商业秘密说给他们,你所学到的一切能给你带来优势的东西,他们会听你的,但行为不会改变。当我读到这句话时,我说:"这太荒谬了。世界不可能是这样运转的"。我才二十出头,还没有经历过生活,也不知道事情是怎么运作的,但我向自己承诺,我要证明汤姆-彼得斯是错的,我要从两个方面证明他是错的。第一,我要寻找人类看到聪明的事情发生的实例,并复制或克隆它,因为这将证明他是错的;第二,每当我看到有人做聪明的事情时,我就要复制它,因为这也证明他是错的。从我二十出头到现在,今年我就要六十岁了,我发现,因为我成为了这方面的学生,汤姆-彼得斯大部分时候都是对的。我仍然不知道为什么会这样,但人类对克隆有一种反感。他们莫名其妙地认为,不是自己想出了这个主意,就有失身份。我还发现,当我强迫自己复制那些我认为聪明的东西时,会给我带来很大的优势。这是一个简单想法的例子。我发现,有极少数人类是克隆高手,这些人类拥有整个世界。他们做得非常好。
Warning
不复制的大部分原因是不知道对错,显而易见要看是谁在看。
For example, almost everything at Microsoft is cloned. Microsoft spends billions of dollars on its research labs. Nothing has ever come out of that. What has worked for them is looking at Lotus and creating Excel, looking at WordPerfect and creating Word, looking at the Mac and creating Windows, and so on. Even now, OpenAl is a partnership with AI. Google did the work. Microsoft did none of the work, and they are ahead. Sam Walton was another great cloner. In fact, James Sinegal, former CEO of Costco, had cloned the entire model from Sol Price, who he used to work for. Someone asked him, "What did you learn from Sol Price?" His response was, "It is the wrong question. Everything I know is from Sol Price. There is nothing I know that did not come from Sol Price." These were people who took a simple idea very, very seriously. It is not just enough to read about some idea and be impressed with it. When you see that something grabs you, you have to go all in and you have to fight the normal tendency of the status quo.
例如,微软几乎所有的东西都是克隆的,微软在研究实验室上花费了数十亿美元,但却没有任何成果。他们的成功经验是:研究 Lotus 并创造出 Excel,研究 WordPerfect 并创造出 Word,研究 Mac 并创造出 Windows,等等,即使是现在,OpenAl 也是与AI合作。谷歌(在AI方面)做了大量工作,微软没做什么工作,但他们却领先了。山姆-沃尔顿是另一位伟大的克隆者。事实上,好市多的前首席执行官詹姆斯-西内格尔(James Sinegal)就是从他曾经的同事索-普莱斯(Sol Price)那里克隆了整个模式。有人问他:"你从索尔-普赖斯那里学到了什么?"他的回答是:"这是问错了。我所知道的一切都是从 Sol Price 那里学来的。我所知道的一切都来自于Sol Price"。这些人非常、非常认真地对待一个简单的想法。仅仅读到一些想法并对其印象深刻是不够的。当你看到某个东西抓住了你的心,你就必须全力以赴,你必须与想要维持现状的倾向做斗争。
Warning
巴菲特的评价更加简练:“First come the innovators, then come the imitators, then come the idiots.”,只会模仿的也是白痴。
Both Charlie and Warren, their success has come from the dogged pursuit of a few very simple ideas. For example, when they bought See's Candies in the 70s in California, it was a huge jump for them. They paid three times the book value for the company. They thought they were paying too much, and they didn't understand how good a business it was. The only thing Warren did every year was he left the management alone to run the business. However, on January 1st of each year, he changed all the prices significantly above the rate of inflation. For instance, if inflation was  , he would raise the price  , and the next year it was 3 or  , he would raise another  . What surprised him was he kept pounding in these very heavy price increases and unit volumes kept going up. It stunned him that you could have a business with this much pricing power. Both Warren and Charlie did not understand brands and did not understand the power of brands, but they became very ardent students of "What was this phenomenon? What did this mean? How can we apply this in other businesses?" Today, we see that it was fundamental to Berkshire, because it was, again, looking at a relatively simple idea, but trying to get your arms around it. Many of us start businesses because we see an offering gap. We see some product or service that should exist in the world but doesn't, or maybe there is not enough of it, so we go into it. Once we take that plunge, having this notion of the dogged pursuit of simple ideas will lead to a lot of good things.
无论是查理还是沃伦,他们的成功都源于对一些简单想法的执着追求。例如,70 年代他们在加利福尼亚收购 See's 糖果公司时,对他们来说是一次巨大的跳跃。他们为这家公司支付了三倍于账面价值的价格。他们认为自己花的钱太多了,而且他们并不了解这家公司的业务有多好。沃伦每年做的唯一一件事,就是让管理层独自经营公司。不过,在每年1月1日,他会上调所有价格,提价幅度远高于通胀增速。例如,如果通货膨胀率是3%,他就会把价格提高10%,第二年是3%或4% ,他就会再提高10%。让他惊讶的是,他不断地大幅提价,单位销量却不断上升。这让他惊呆了,原来企业可以有这么大的定价权。沃伦和查理都不了解品牌,也不了解品牌的力量,但他们都非常热衷于研究 "这是什么现象?这意味着什么?我们如何将其应用到其他业务中?今天,我们看到,这对伯克希尔公司来说是至关重要的,因为这同样是在研究一个相对简单的想法,但却要努力去掌握它。我们中的许多人创办企业,是因为看到了市场空白。我们看到一些产品或服务应该存在于这个世界上,但却没有,或者可能还不够多,于是我们就投入其中。一旦我们下定决心,坚持不懈地追求简单的想法,就会有很多好的结果。
Warning
瞎扯,1986年的股东信写的很清楚,之前6年的同店销量一直下降,只能通过新增门店来维持总体总量,这种情况下也只能小心谨慎的提点价格,参考:《1987-02-27 Warren Buffett's Letters to Berkshire Shareholders》,某种程度上都是段永平的同道中人,接近巴菲特和芒格都是为了销售自己的基金。

7、《2024-05-04 Berkshire Hathaway Annual Meeting》

But there is something that comes along that takes a whole bunch of observations that you’ve made and knowledge you have and then crystallizes your thinking into action.
但是,有一种东西会把你的观察结果和你所掌握的知识汇集在一起,然后把你的想法具体化为行动。

Big action in the case of Apple. And there actually is something, which I don’t mean to be mysterious, but I really can’t talk about, but it was perfectly legal, I’m sure, you know, that. It just happened to be something that entered the picture that took all the other observations. And I guess my mind reached what they call apperceptive mass, which I really don’t know anything about, but I know the phenomenon when I experience it. And that is, we saw something that I felt was, well, enormously enterprise.
在苹果案例中有大动作。实际上有些事,我并不是故意神秘,但我真的不能谈论,但我确定那是完全合理的,你知道的。这只是恰好出现在画面中的某件事,它占据了所有其他的注意力。我想我的思绪达到了他们所说的apperceptive mass,我对此并不了解,但当我经历它时,我知道这种现象。那就是,我们看到了我觉得,嗯,非常巨大的企业。

。。。
Charlie and me, and there is an aspect of knowing a whole lot and having a whole lot of experiences and then seeing something that turns on the light bulb.
查理和我想出了一个生意,其中有一个方面是,我们知道了很多事情,有了很多经历,然后看到了一些事情,点亮了灯泡。

And that will continue to happen. And I hope it happens a few times to you, but you can’t make it happen tomorrow, but you can prepare yourself for it happening tomorrow, and it will happen sometimes.
这种情况还会继续发生,我希望你能遇到几次这样的事,但你无法让它明天就发生,但你可以为明天发生的事做好准备,它有时会发生。
03、“顿悟”如何发生

课代表立正:这也呼应了我想和你对话的初衷——你的研究重点之一正是 Grokking:解释模型如何从“记忆式拟合”跃迁到“结构化泛化”。你的论文就是围绕这一机制展开的。 

田渊栋:对。Grokking 提供了一条观察“从不可压缩到可压缩表示”的动力学路径(dynamics)。理解这条路径,有助于我们在数据与算力受限的环境中,用更少的样本与更可靠的训练信号,获得可泛化的表示与更强的模型。 

课代表立正:你刚才提到的“顿悟”并非只是某个具体任务层面的能力,而是更底层的机制:在某个时间点,模型完成了一次表示的重组,就像“学会了”某件事。

我有关注到你此前的专访,以及我与Denny Zhou 在 X平台上关于 chain-of-thought(思维链)的讨论中,也探讨过类似的现象。从理论上讲,如果逻辑链条能够被完整表达,那么 chain-of-thought 应该是可以求解的;

但现实中,模型往往需要大量数据去逼近解,而人类却能在瞬间抓住要点。这种差异似乎与刚才所说的那种底层机制相关。如果要给这种能力下定义,你会倾向称之为 reasoning(推理能力),还是另有所指?

田渊栋:更准确地说,它发生在 reasoning 或其他任务之下的“共同底层”机制,那就是 representation learning(表征学习)。

随着训练推进,模型的表征会不断演化。一开始更像是死记硬背;但随着足够的积累和联结,结构会突然“贯通”,从而出现类似“读书百遍,其义自见”的转折点。比如说在小学生的教育中,老师可能会先要求他们背诵一些知识,过段时间通过新的知识联结,原本模糊的含义逐渐显现,这就是顿悟的一部分。 

课代表立正:也就是说,无论是 chain-of-thought 还是直觉判断,其实最终都依赖于“我如何表示、如何理解这个世界”这一底层机制? 

田渊栋:对。比如,小学生可能解题靠穷举;而进入初高中后,引入了数学归纳法,仅靠简洁的证明就能覆盖无限情形,这种方法背后的“表示”就发生了根本性变化。神经网络的学习关键差异,也正体现在表征方式上。
Idea
“顿悟”描述了神经网络在训练过程中,性能从长时间的停滞(看似只会记忆),突然飞跃到能够完美泛化(真正理解了规律)的现象。这与人类学习中“读书百遍,其义自见”或武侠小说里张无忌先背下心法再融会贯通的体验惊人地相似。

那么,这个神秘的“突变”究竟是如何发生的?田博士用一个生动的“双峰模型”揭示了其内在的数学图景:
  1. 记忆与泛化的不同“解”:在一个复杂的优化空间中,“记忆”和“泛化”可以被看作两个不同的解,对应着两个不同的“山峰”。记忆是一种低效的解,需要模型记住所有特例;而泛化是一种高效、优雅的解,模型找到了数据背后更简洁的统一规律(short program)。
  2. 数据驱动的山峰演变:当训练数据不足时,“记忆山峰”更高,因为记住所有样本是降低训练误差最直接的方式。此时,模型的优化过程自然会收敛到这个山峰。
  3. 此消彼长的临界点:随着数据量的增加,数据中潜在的“泛化规律”开始显现。这使得“泛化山峰”逐渐升高,而“记忆山峰”相对降低。当数据量跨过一个临界点,“泛化山峰”的高度首次超过了“记忆山峰”。
  4. 顿悟的发生:由于优化算法总是倾向于寻找全局最优解(更高的山峰),在“泛化山峰”成为最高点的瞬间,模型的参数便会“雪崩式”地涌向这个新的、更优的解。宏观上,这就表现为一次突然的、性能飞跃式的“顿悟”。
这个解释极大地祛魅了“涌现”或“顿悟”的神秘感,将其从一个看似随机的魔法,还原为一个由数据分布和优化动力学共同决定的、有清晰路径的物理过程。泛化的能力并非凭空产生,它一直作为一种可能性存在于数据之中,等待着足够多的证据使其“脱颖而出”。这个比喻的深刻之处在于:
  1. 确定性:它告诉我们,“顿悟”不是随机的奇迹,而是当数据量达到某个临界点后,几乎必然会发生的相变。
  2. 竞争性:“记忆”和“泛化”是两种相互竞争的解决方案,模型在训练中会动态地选择在当前数据下“性价比”更高的那一个。
  3. 可操作性:它启发我们,促进“顿悟”的发生,关键在于如何设计数据和训练方法,来更快地“抬高”泛化山峰,“压低”记忆山峰。
*****
08、loss function只是“代理信号”,不是目的

课代表立正:你曾提到我们定义的 loss function,并不是我们真正想优化的目标,而是它的一个“代理函数(surrogate objective),这个观点该如何理解?

田渊栋:损失函数的核心作用,是生成合适的梯度流(gradient flow),以推动表示朝“正确方向”更新。不同的损失函数可以诱导出相似的梯度结构,从而学到相似的表征。 

目标函数本身并非“终极目的”,而是为可学习的优化路径提供一种可计算的代理信号。很多表征学习中的目标函数,拆解后本质上都是不同形式的反向传播(backpropagation)梯度。只要梯度结构相近,哪怕换一种损失函数,学到的表征也会很接近。
Idea
有些话题不让说了,很多大V转型成为情感的博主,条条大路通罗马。
课代表立正:可以将“梯度”想象为等高线图上最陡的下降方向,而这些等高线最终勾勒出的就是对世界规律的刻画。

田渊栋:这个比喻非常贴切。我们沿着等高线行进,寻找能够统一解释更多现象且更简洁的结构;当证据与归纳偏置协同达到一定程度时,模型就会“跨峰”进入可泛化的表示状态。表面上看是“顿悟”,实际上是优化动力学的自然结果。 
Idea
我们通常所说的损失函数(Loss Function),其本身并非学习的终极目标,它更像是一个“代理”(Surrogate)。这句话是对深度学习核心机制的一次“正本清源”。它打破了许多人心中“学习=最小化损失函数”的朴素认知。其核心思想是:我们追求的不是一个数字的最小化,而是一个高质量内部表征(representation)的形成。损失函数和优化器只是我们用来雕刻这个“表征”的刻刀和锤子。这个视角的转变意义重大:它鼓励研究者跳出对特定loss形式的执着,转而从“我们希望表征学习到什么样的数据结构”出发,去设计能产生理想梯度流的训练信号。这是从“术”的层面上升到“道”的层面,是理解表征学习的关键。

他解释道,损失函数的真正作用,是“产生一个梯度流(gradient flow),这个梯度流能够让这个表征(representation)往正确的方向走”。换言之,目标是学习到一个好的数据表征,而损失函数只是创造出实现这一目标所需驱动力的工具。只要最终产生的梯度流是相似的,即便使用形式上看起来千差万别的损失函数,也可能学到相似的优质表征。

这个观点将我们对模型训练的理解,从仅仅关注“降低一个数字(loss)”,提升到了关注“塑造一个结构(representation)”的更高维度。它也解释了为什么AI领域充满了各种看似奇怪却有效的损失函数设计——因为它们的核心都在于为参数优化提供正确方向的“力”,而非函数本身的形式。这背后,其实隐藏着一种对“美感”或“优雅”(Elegance)的隐性偏好,即神经网络在训练过程中,会内生地偏爱那些更简洁、更具压缩性的解释。
课代表立正:回到“记忆与泛化”的关系。给模型更多“记忆材料”,是否会提高泛化的可能性?

田渊栋:在许多任务中确实如此。看到的组合越多,模型就越能学到稳健的表征,这种表征对未见过的组合也具备预测能力,这就是泛化。真正的“理解”往往表现为方法论能力的提升,能在新情境下,用少量且简单的逻辑统一解释更多现象,并能推广到更多场景。 

课代表立正:如果数据很少,模型学不到好的表征,会发生什么?

田渊栋:它会倾向于记忆式学习,以满足训练误差的目标;但一旦超出训练集范围,错误率就会上升,人们往往会将其归因于过拟合或记忆主导。
Idea
标签的缺点,没有经过大量数据检验的标签不可靠。

    热门主题

      • Recent Articles

      • 2007-02-28 Warren Buffett's Letters to Berkshire Shareholders

        Refer To:《2007-02-28 Warren Buffett's Letters to Berkshire Shareholders》。 To the Shareholders of Berkshire Hathaway Inc.: Our gain in net worth during 2006 was $16.9 billion, which increased the per-share book value of both our Class A and Class B ...
      • 2009-02-27 Warren Buffett's Letters to Berkshire Shareholders

        Refer To:《2009-02-27 Warren Buffett's Letters to Berkshire Shareholders》。 To the Shareholders of Berkshire Hathaway Inc.: Our decrease in net worth during 2008 was $11.5 billion, which reduced the per-share book value of both our Class A and Class B ...
      • 2025-10-14 Tracy Britt Cool.What I Learned Working With Buffett

        Refer To:《2025-10-14 Tracy Britt Cool.What I Learned Working With Buffett》。 INTRODUCTION 引言 SHANE PARRISH: Tracy, welcome to the show. SHANE PARRISH:Tracy,欢迎来到节目。 TRACY BRITT COOL: Thank you. It’s an honor to be here. I appreciate it. TRACY BRITT ...
      • 2010-05-26 Warren Buffett.Interview With FCIC

        INTERVIEWER: Thank you. Mr. Buffett we’re with the staff of the Financial Crisis Inquiry Commission. We were formed by Congress in 2009 to investigate the causes of the financial crisis both globally and domestically. And to do a report, due at the ...
      • 1988-02-29 Warren Buffett's Letters to Berkshire Shareholders

        Refer To:《1988-02-29 Warren Buffett's Letters to Berkshire Shareholders》。 To the Shareholders of Berkshire Hathaway Inc.: Our gain in net worth during 1987 was $464 million, or 19.5%. Over the last 23 years (that is, since present management took ...