I.C.S.105.我们的人才管理

I.C.S.105.我们的人才管理

人的神经系统是分层次的,每一层都是生物进化的奇迹,我们整个论述就两点,一个事实:“精准执行”是一套独立的神经系统并且有非常强大的可靠性;一个“aha” moment:认识到自己有“精准执行”的能力,并且这是非常重要、非常有价值的能力,即使整个“情绪自律”的神经系统是坏的但只要有这一个“aha” moment整个人生就能运作的很好。

1、默默运行的奇迹

人的植物神经是最底层的控制系统,控制人体的体温、心跳、消化、呼吸,等等,非常可靠、非常精准。

2、精准执行的奇迹

阿甘正传有一个片断,讲述了什么是精准执行以及为什么就阿甘做到了,参考:《阿甘-智商160的天才》
Quote
Drill Sergeant: GUUUUUUMP! What's your sole purpose in this army?
教官: 阿甘,你到部队来干什么?

Forrest Gump: To do whatever you tell me, Drill Sergeant!
阿甘: 干你叫我干的事,教官!

Drill Sergeant: God damn it, Gump, you're a goodamn genius. That's the most outstanding answer I've ever heard. You must have a goddamn I.Q of 160, You are goddamn gifted, Private Gump.
教官: 天杀的,阿甘,你他妈的真是个天才,这是我听过的最了不起的回答,你他妈的智商一定有160,你他妈的真有天赋,列兵阿甘。

Forrest Gump: Done, Drill Sergeant!
阿甘: 完成了,教官!

Drill Sergeant: GUUUUUUMP! Why did you put that weapon together so quickly, Gump?
教官: 阿甘,为什么你这么快就把枪装好了?

Forrest Gump: You told me to, Drill sergeant
阿甘: 你叫我这么做的,教官。

Drill Sergeant: Jesus H. Christ. This is a new company record! If it wasn't a waste of a damn-fine enlisted man I'd recommend you for OCS!  Private Gump, You are going to be a general someday! Gump, now disassemble your weapon and continue!
教官: 我的老天爷啊,这是新的连队纪录,如果不是因为我们这儿兵源不足,我会推荐你上军官预备学校,列兵阿甘,有朝一日你会当上将军,现在把你的枪拆开重装一遍。
有大量类似的场景,流水线上的工人,见到螺丝孔就把螺丝拧上,这是几乎所有人都具备但少有人相信的能力,特别是听到“听话办事”就有种恐惧,前两天看到一则关于风险的评论,有两种反应,一种是看到对方拔刀了就跑,另一种是刀架脖子才有反应,前者需要更强大的安全感,后者就是“听话办事”,流水线上的工人。

恐惧掩盖了两个事实,一是精准执行的神经系统没有吸引眼球的东西但非常可靠,可靠程度仅次于最底层的植物神经;第二,很多人不相信自己有这个能力。

最后,恐惧是真实的、事实也是真实的,相信恐惧,恐惧不会消失;相信事实,恐惧还在不在?阿甘就是例子,相信恐惧还是相信事实?

3、能力圈、情绪自律

巴菲特说的“aha” moment,以及人工智能领域的Generalization(泛化)、Grokking(顿悟),Compression(压缩)、Representation learning(表征学习),说的都是同一个意思,我们在实际表述时用的是压缩(Ilya Sutskever、田渊栋都认可这个说法)。田渊栋有一个他自己的定义:所谓“真正理解”,一方面体现在能对新情形给出正确答案;另一方面则体现在能够将问题还原为更简单、可广泛适用的逻辑。

压缩或者泛化是人脑同时也是AI比较高层次的神经系统,但这个部分是在奖励函数(Reward)指定的方向下运作的。Reward = 方向选择器,Compression = 结构发现器,Reward负责方向,Compression负责对现实世界的解释,方向不同对现实世界的解释完全不同(恐惧是左右方向的决定性因素)。

比较容易观察的是想法的对齐(alignment),AI关于对齐(alignment)的思路是:“同一套压缩出来的原则,对“结构相同但表述不同”的两个句子,能不能给出同样的答案?”,巴菲特经常用的例子是一个汉堡的小测验:(1)如果一辈子要吃汉堡,不是养牛人,你希望牛肉价格高还是低?(2)未来很多年你是股票的净买方,你应该希望股市更高还是更低?很多人甚至是大部分人对这个“结构相同但表述不同”的两个句子会给出完全不一样的答案,这就是不对齐的现象,更进一步如果出现不对齐很可能处处不对齐,没有一个想法,甚至没有一句话是对齐的(前半句和后半句对不上)。

我们把Reward+ Compression合并为“高度压缩”的神经系统,现实中有各种各样的经验之谈,不对齐的风险是超出自己的能力边界,巴菲特写的15条原则中有1条是关于风险的,他用的短语是“have and need”,参考:《2018-02-24 BERKSHIRE HATHAWAY INC.-AN OWNER'S MANUAL》
Quote
The financial calculus that Charlie and I employ would never permit our trading a good night’s sleep for a shot at a few extra percentage points of return. I’ve never believed in risking what my family and friends have and need in order to pursue what they don’t have and don’t need.
查理和我采用的财务计算方法绝不会允许我们为了多几个百分点的回报而牺牲安稳的睡眠。我从不相信为了追求家人和朋友没有且不需要的东西而冒险他们已有且需要的东西。
Idea
“have and need”在解释“能力圈”和“安全边际”时可以作为补充。

1、have and need
巴菲特很小就想要获得独立,“need"是经济上的独立,"have"是那个年纪(13岁)可以送报纸赚钱,天天凌晨四点多起床送报纸,后来一路扩展到好几条路线,到 14、15 岁时已经靠送报纸每月赚到大约 175 美元(在当时比不少老师工资还高),这跟现在外卖小哥的情况几乎是一样的。

2、don’t have and need
如果精准执行是送报纸,高度压缩的能力是买股票,如果都能填饱肚子送报纸和买股票是一样的,巴菲特小时候和阿甘都是“送报纸”,如果既能送报纸又能买股票,挑一个自己更喜欢的是可以理解的,但如果一边饿着肚子,干不了买股票的事也不知道自己还能送报纸肯定是脑子出了问题。

当然看不见是因为“Attention Is All”,注意力全在买股票了。

3、have and don’t need
苹果手机,闲鱼上卖2000块,苹果店里卖10000块,使用上没有任何的区别,可见实际需求是2000,8000块是想象中的需求,巴菲特曾经拿保单举过例子,同样的道理。

4、don’t have and don’t need
“have”、“need"都指向边界,前者是能力圈的边界,后者是安全边际的边界(买一个股票,8000块是想象中的估值就非常麻烦),巴菲特的意思是只要有一个边界破了最终都会走到“don’t have and don’t need”的状态。
如果是一台电脑,超出能力边界就会“胡言乱语”,其实是很容易识别的,很多人不管讲的是什么,高度压缩后只是前后两个逻辑的几种变化,比如,不懂但想要试一试、不想懂但想要试一试、知道是错的但想要试一试,等等,就像巴菲特举的一个例子,参考:《2022-05-02 Berkshire Hathaway Annual Meeting》
Quote
Well, that’s what Charlie’s — when he was in the law practice, he had a partner, Roy Tolles. And every smart guy that would get in trouble — usually it was guys, and usually it was with women — and, you know, they’d come into the office, and they’d look, you know, down-faced and everything. And they’d say, it seemed like a good idea at the time, you know. I mean —(Laughter)
这就像查理当律师的时候,他有个合伙人叫 Roy Tolles。每个惹上麻烦的聪明人——通常是男的,而且通常是因为女人——都会跑到律所来,垂头丧气地坐在那里,然后说一句:“当时看起来是个好主意。”(众笑)

And their lives unraveled, you know, in many cases.
结果很多人的人生就这么散架了。
很多关于能力圈的评论都没有讲述事实的全部,关键要有结构化的思维,知道自己只有精准执行的能力,超过这个层次就会“胡言乱语”的非常罕见,如果真的懂了,那么周围几乎所有人都做着远超这个层次的事情,精准执行的工作并不会低人一等、也不缺机会,甚至也有机会获得财务上的成功,不知道的才是真正的愚蠢。

最后做好精准执行或者做好高度压缩的工作会不会有不一样的人生?很难讲,费雪、巴菲特,甚至更早期的研究都发现历史上只有两类成功的企业,一类是成本低(精准执行),另一类是有品牌(高度压缩)。

4、Appendix

And when I’m not around, the logical, at some point — it depends on exactly when it happens, again. But Charlie’s a little older than I am. And it’s likely that it will be broken into a two-person function again, but not exactly the way Charlie and I function.
当我不在时,按照逻辑,在某个时点——这还要取决于事情发生的具体时间。由于 Charlie 比我年长一些,很可能会再次分成两个人的职能,但不完全是我和 Charlie 现在这种分工方式。

And that is that there will be someone in charge of investments and capital allocation. I mentioned Lou Simpson’s position, because he is younger than I am, in the annual report, and then someone in charge of operations. And we have that person in the organization now.
也就是说,会有人负责投资与资本配置——我在年报里提到过 Lou Simpson 的角色,因为他比我年轻——同时还会有人负责运营。而我们现在组织内已经有这样的人选。

Now, I don’t know what the situation will be when I die, because it could be in 20 minutes or it could be in 20 years. And when that — so, I can’t specifically name the individuals.
不过我不知道自己去世时的具体情况,因为那可能是 20 分钟后,也可能是 20 年后。既然如此——我就无法具体点名这些人。

We have the individuals now for both those functions. We’ll have the individuals for the same functions 20 years from now. I don’t know whether they’ll be the same people.
现在我们在这两个职能上都有人;20 年后我们在同样的职能上也会有人。我不知道是否还是同一批人。

But it’s quite a logical way to run the business. GEICO was run that way and still is run that way and has been for some years.
但这是一种相当合理的经营方式。GEICO 就是这样运作的,现在仍是如此,而且已经这样很多年了。

It’s always struck me as terribly illogical, the way property-casualty insurance companies are run, because they’ve been dominated by the underwriting side of the business. And here they have this important investment side, but it’s always been — virtually every company’s been subservient to the underwriting.
我一直觉得财产意外险公司的管理方式非常不合逻辑,因为它们长期由承保端主导。而明明还有极其重要的投资端,但几乎每家公司都让投资端从属于承保端。

And GEICO, very logically, set up a co-CEO arrangement some years back where — originally Bill Snyder before that — but Tony Nicely ran the underwriting end of the business and Lou Simpson ran the investment side.
而 GEICO 很合理地在若干年前设立了联席 CEO 架构——在那之前是 Bill Snyder——Tony Nicely 负责承保端业务,Lou Simpson 负责投资端。

And those are two very different functions. Same person, logically, doesn’t fit both functions in most cases. I mean, it’s a rarity when the same person happens to hit for both functions.
这两项是截然不同的职能。从逻辑上说,同一个人多数情况下并不适合同时胜任二者。同一人能在两端都打出好成绩是非常罕见的。

So GEICO worked very well that way. Still works that way. Lou runs investments. Tony runs underwriting.
因此 GEICO 以这种方式运作得非常好。现在还是这样。Lou 负责投资,Tony 负责承保。

And Berkshire — slightly different — it’s a variant on it. But, essentially, at Berkshire headquarters, you need someone overseeing and not meddling in them too much, but making sure you’ve got the right manager and you’re treating him fairly.
而 Berkshire——略有不同——是这种模式的一个变体。基本上,在 Berkshire 总部,你需要有人进行监督,但不过度干预;要确保你找对了经理人,并公平对待他。

You need someone on the operating side. You need someone on the investment/capital allocation side. We’ve got those people now. And we’ll have them, you know, whenever it happens, too.
你需要一个负责运营的人;你也需要一个负责投资/资本配置的人。我们现在就有这些人。不管事情何时发生,我们届时也会有这些人。

That’s the — that is the structure. And we’ve got some very good businesses.
这——这就是结构。而且我们拥有一些非常优秀的业务。

And, you know, nobody’s buying See’s Candy because they think I’m sitting in some office in Omaha. And no one’s buying a GEICO insurance policy because, you know, my name is there as chairman or CEO. The businesses are marvelous businesses. They’ll continue very well.
而且你知道,没有人是因为以为我坐在奥马哈某个办公室里才去买 See’s Candy。也没有人是因为我的名字挂着董事长或 CEO 才去买 GEICO 的保单。这些业务本身就很出色,它们会继续发展得很好。

And there will be a capital allocation problem then just like there is now. And there will be the problem of keeping good managers in place and treating them fairly. And that’s a solvable problem.
届时仍会像现在一样面临资本配置问题,也会面临如何留住优秀经理人并公平对待他们的问题。而这些都是可解的问题。

So, that’s the future as seen from Kiewit Plaza.
所以,这就是从 Kiewit Plaza 望去所看到的未来。
And that we lay out that exception relating to businesses where we think there’s a permanent loss of cash for as far as the eye can see, or businesses where we have labor troubles, which we — I described earlier in the day, we might’ve had at The Buffalo News at that one period.
当然也有例外,比如那些我们认为在可预见的未来会产生永久性现金损失的企业,或者那些存在劳资问题的企业——就像今天早些时候我提到的《水牛城新闻》在某段时期的问题。

But otherwise, simply because we can use the money better someplace else, we’re not interested in it.
但除此之外,仅仅因为我们可以在其他地方更好地使用这笔钱,我们对此并不感兴趣。

You know, I can’t really dig into my psyche and tell you how much of that is because I think that will help us buy businesses in the future if we behave that way, or how much is just my natural inclination that when I make a deal with somebody and I’m happy with how they behave with me, that I want to stick with them. It’s probably both, you know?
我无法深入内心告诉你,这有多少是因为我认为如果我们以这种方式行事,会帮助我们未来购买企业;又有多少只是因为我的自然倾向——当我与某人达成交易并对他们的行为感到满意时,我会愿意坚持与他们合作。这可能两者都有,你知道吗?

And I wouldn’t want to try and weight the two. I’m happy, you know, with the results of the first and I’m happy with the way I feel, essentially, about the second.
我不想尝试去权衡这两者。你知道,我对第一种结果感到满意,也对第二种带给我的感觉本质上感到满意。

I just think it’s crazy — I know if I owned all of Berkshire myself, I wouldn’t dream of trading around businesses with people that have trusted in me and that I like and have been more than fair with me.
我只是觉得那样做很疯狂——我知道,如果我自己拥有整个伯克希尔,我绝不会想要与那些信任我、我喜欢并且对我非常公平的人的周围进行业务交易。

I wouldn’t dream of trading around businesses so that my estate was 105 percent of some very large number instead of 100 percent of some large number. I just would regard that as a crazy way to live.
我不会为了让我的遗产达到某个非常大的数字的105%而不是100%去进行业务交易。我只是会认为那是一种疯狂的生活方式。

And I don’t want the fact I run a public company to cause me to behave in a way that I would be uncomfortable behaving if we were a private company.
我也不希望因为我经营一家上市公司而导致我以一种让我在经营私人公司时会感到不舒服的方式行事。

But I also feel that you, as shareholders, are entitled to know that that’s an idiosyncrasy of mine. And therefore, I lay it out, and have laid it out for 20 years, as something that you should understand, as an investor or before you become an investor.
但我也认为,作为股东的你们,有权知道这是我的一个特殊癖好。因此,我将其明确说明,并且已经说明了20年,作为投资者或者在成为投资者之前,你们应该了解这一点。

I’m sure it helps us in acquisitions over time. But whether that in any way compensations the opportunity costs that Charlie talks about of making an occasional advantageous disposal, I don’t know and it’s something I’ll never calculate.
我确信,这在长期来看有助于我们的收购。但是否在某种程度上弥补了查理提到的因偶尔错过有利的处置机会而产生的机会成本,我不知道,这也是我永远不会去计算的事情。
CHARLIE MUNGER: Well, the main thing is that practically nobody else does it. And yet to me it’s obvious it’s the way to go.
查理·芒格:嗯,主要是几乎没有其他人这样做。然而对我来说,很明显这是正确的做法。

There’s a lot in Berkshire that is like that. It’s just a little different from the way other people do it, partly the luxury of having a controlling shareholder of strong opinions.
在伯克希尔有很多这样的情况。它只是与其他人做事的方式有些不同,部分原因是拥有一个有强烈意见的控股股东的奢侈。

That accounts for this. It would be hard for a committee, including a lot of employees, to come up with these decisions.
这就是这一切的原因。对于一个委员会,包括许多员工来说,很难做出这样的决定。
We tend to let our many subsidiaries operate on their own, without our supervising and monitoring them to any degree. That means we are sometimes late in spotting management problems and that both operating and capital decisions are occasionally made with which Charlie and I would have disagreed had we been consulted. Most of our managers, however, use the independence we grant them magnificently, rewarding our confidence by maintaining an owner-oriented attitude that is invaluable and too seldom found in huge organizations. We would rather suffer the visible costs of a few bad decisions than incur the many invisible costs that come from decisions made too slowly — or not at all — because of a stifling bureaucracy.
我们往往让众多子公司自行运作,几乎不进行任何程度的监督与监控。这意味着我们有时较晚才发现管理层问题;也意味着在经营与资本决策上,偶尔会出现一些“如果当时征求了我们意见,Charlie 和我会不同意”的决定。不过,我们的大多数经理人都把这种独立性发挥得极好:他们以一种以股东为导向的心态来回报我们的信任——这种心态极其宝贵,却在庞大组织中极其罕见。我们宁愿承受少数错误决策带来的“看得见的成本”,也不愿承担因官僚体系扼杀活力而导致的无数“看不见的成本”——决策变慢,甚至干脆不决策。
Steve Forbes: Now, in terms of that, many times, a foundation gets set up. And...
Steve Forbes:不过在实际操作中,很多时候,基金会设立之后,会……

Warren Buffet: Goes off in a different direction.
Warren Buffet:跑偏,朝着不同的方向去了。

Steve Forbes: And there's this thing called Parkinson's Law, that an organization becomes self-centered, in for itself, and forgets its purpose that it was created for.
Steve Forbes:还有所谓的 Parkinson's Law:组织会变得以自我为中心,为自身而存在,忘了最初的使命。

Warren Buffet: I see it all the time.
Warren Buffet:我经常见到这种情况。

Steve Forbes: You see it in business--that's why they go broke oftentimes. But in foundations, you see it. So you made a provision that that was not going to happen with your funds, and they had to be... those monies had to be deployed what, within 10 years?
Steve Forbes:商业里也是如此——这往往就是他们破产的原因之一。基金会也不例外。所以你设了条款,避免你的资金出现这种情况,要求这些钱必须在——多久之内花完?十年?

Warren Buffet: 10 years after my estate's completed. Yeah, and the money has to all be spent--it can't go to institutions which in turn put it in there endowment or anything like that. I want people that I know, and I know are in sync with me, and I know will be true to certain ideals. I want them to dispense it because who the hell knows, 50 years from now, when the place becomes some large institution, what will happen. People will rationalize then that what's good for the institution is exactly what old Warren thought 40 years ago on his deathbed. So I've seen that happen too often. And I... foundations are not tested by a market system. If you've got a business idea, and he's got music, it's being tested by a market system. People will make a decision was whether that next album is good, and they'll make a decision whether Coca Cola still keeps them happy, and all that sort of thing. A foundation has no market tests. So it's very easy, if there's not a market test, as people will find out in government and other places, it's very easy to start rationalizing things that are a long way from what you originally--people thought you were setting out to do.
Warren Buffet:在我的遗产清算完成后的10年内。对,而且这些钱必须全部花掉——不能给那些再把钱放进自己捐赠的机构。我希望把钱交给我认识、与我理念一致、忠于特定理想的人去使用。因为谁也不知道50年后,当某个机构变成庞然大物时会变成什么样。到那时人们会自圆其说,把“对机构有利”的事,解释成“老 Warren 在四十年前临终时的本意”。这种事我见得太多了。再者……基金会没有“市场检验”。你有商业点子,他有音乐作品——都会被市场检验。人们会用脚投票:下一张专辑好不好、Coca Cola 是否还能令他们满足,等等。而基金会没有市场检验。没有市场检验,就很容易——政府和其他领域的人也会发现——很容易开始把事情合理化,离最初的使命越来越远。
He is a very good listener who gives excellent advice, and he’s also pretty firm about not giving unasked advice. The managers vary in their desire (for asking for advice). The ones that do ask use words like “invaluable” to describe his advice.
他是一个非常好的倾听者,给出优秀的建议,而且他在不主动给建议方面也相当坚定。经理们在寻求建议的意愿上各不相同。那些确实寻求建议的人用“无价”这样的词来形容他的建议。
The lesson for investors: The weeds wither away in significance as the flowers bloom. Over time, it takes just a few winners to work wonders. And, yes, it helps to start early and live into your 90s as well.
给投资者的启示:当鲜花绽放,杂草便失去存在感。随着时间推移,只需少数赢家便能创造奇迹。当然,早点开始并活到九十多岁也大有裨益。
03、“顿悟”如何发生

课代表立正:这也呼应了我想和你对话的初衷——你的研究重点之一正是 Grokking:解释模型如何从“记忆式拟合”跃迁到“结构化泛化”。你的论文就是围绕这一机制展开的。 

田渊栋:对。Grokking 提供了一条观察“从不可压缩到可压缩表示”的动力学路径(dynamics)。理解这条路径,有助于我们在数据与算力受限的环境中,用更少的样本与更可靠的训练信号,获得可泛化的表示与更强的模型。 

课代表立正:你刚才提到的“顿悟”并非只是某个具体任务层面的能力,而是更底层的机制:在某个时间点,模型完成了一次表示的重组,就像“学会了”某件事。

我有关注到你此前的专访,以及我与Denny Zhou 在 X平台上关于 chain-of-thought(思维链)的讨论中,也探讨过类似的现象。从理论上讲,如果逻辑链条能够被完整表达,那么 chain-of-thought 应该是可以求解的;

但现实中,模型往往需要大量数据去逼近解,而人类却能在瞬间抓住要点。这种差异似乎与刚才所说的那种底层机制相关。如果要给这种能力下定义,你会倾向称之为 reasoning(推理能力),还是另有所指?

田渊栋:更准确地说,它发生在 reasoning 或其他任务之下的“共同底层”机制,那就是 representation learning(表征学习)。

随着训练推进,模型的表征会不断演化。一开始更像是死记硬背;但随着足够的积累和联结,结构会突然“贯通”,从而出现类似“读书百遍,其义自见”的转折点。比如说在小学生的教育中,老师可能会先要求他们背诵一些知识,过段时间通过新的知识联结,原本模糊的含义逐渐显现,这就是顿悟的一部分。 

课代表立正:也就是说,无论是 chain-of-thought 还是直觉判断,其实最终都依赖于“我如何表示、如何理解这个世界”这一底层机制? 

田渊栋:对。比如,小学生可能解题靠穷举;而进入初高中后,引入了数学归纳法,仅靠简洁的证明就能覆盖无限情形,这种方法背后的“表示”就发生了根本性变化。神经网络的学习关键差异,也正体现在表征方式上。
Idea
“顿悟”描述了神经网络在训练过程中,性能从长时间的停滞(看似只会记忆),突然飞跃到能够完美泛化(真正理解了规律)的现象。这与人类学习中“读书百遍,其义自见”或武侠小说里张无忌先背下心法再融会贯通的体验惊人地相似。

那么,这个神秘的“突变”究竟是如何发生的?田博士用一个生动的“双峰模型”揭示了其内在的数学图景:
  1. 记忆与泛化的不同“解”:在一个复杂的优化空间中,“记忆”和“泛化”可以被看作两个不同的解,对应着两个不同的“山峰”。记忆是一种低效的解,需要模型记住所有特例;而泛化是一种高效、优雅的解,模型找到了数据背后更简洁的统一规律(short program)。
  2. 数据驱动的山峰演变:当训练数据不足时,“记忆山峰”更高,因为记住所有样本是降低训练误差最直接的方式。此时,模型的优化过程自然会收敛到这个山峰。
  3. 此消彼长的临界点:随着数据量的增加,数据中潜在的“泛化规律”开始显现。这使得“泛化山峰”逐渐升高,而“记忆山峰”相对降低。当数据量跨过一个临界点,“泛化山峰”的高度首次超过了“记忆山峰”。
  4. 顿悟的发生:由于优化算法总是倾向于寻找全局最优解(更高的山峰),在“泛化山峰”成为最高点的瞬间,模型的参数便会“雪崩式”地涌向这个新的、更优的解。宏观上,这就表现为一次突然的、性能飞跃式的“顿悟”。
这个解释极大地祛魅了“涌现”或“顿悟”的神秘感,将其从一个看似随机的魔法,还原为一个由数据分布和优化动力学共同决定的、有清晰路径的物理过程。泛化的能力并非凭空产生,它一直作为一种可能性存在于数据之中,等待着足够多的证据使其“脱颖而出”。这个比喻的深刻之处在于:
  1. 确定性:它告诉我们,“顿悟”不是随机的奇迹,而是当数据量达到某个临界点后,几乎必然会发生的相变。
  2. 竞争性:“记忆”和“泛化”是两种相互竞争的解决方案,模型在训练中会动态地选择在当前数据下“性价比”更高的那一个。
  3. 可操作性:它启发我们,促进“顿悟”的发生,关键在于如何设计数据和训练方法,来更快地“抬高”泛化山峰,“压低”记忆山峰。
*****
08、loss function只是“代理信号”,不是目的

课代表立正:你曾提到我们定义的 loss function,并不是我们真正想优化的目标,而是它的一个“代理函数(surrogate objective),这个观点该如何理解?

田渊栋:损失函数的核心作用,是生成合适的梯度流(gradient flow),以推动表示朝“正确方向”更新。不同的损失函数可以诱导出相似的梯度结构,从而学到相似的表征。 

目标函数本身并非“终极目的”,而是为可学习的优化路径提供一种可计算的代理信号。很多表征学习中的目标函数,拆解后本质上都是不同形式的反向传播(backpropagation)梯度。只要梯度结构相近,哪怕换一种损失函数,学到的表征也会很接近。
Idea
有些话题不让说了,很多大V转型成为情感的博主,条条大路通罗马。
课代表立正:可以将“梯度”想象为等高线图上最陡的下降方向,而这些等高线最终勾勒出的就是对世界规律的刻画。

田渊栋:这个比喻非常贴切。我们沿着等高线行进,寻找能够统一解释更多现象且更简洁的结构;当证据与归纳偏置协同达到一定程度时,模型就会“跨峰”进入可泛化的表示状态。表面上看是“顿悟”,实际上是优化动力学的自然结果。 
Idea
我们通常所说的损失函数(Loss Function),其本身并非学习的终极目标,它更像是一个“代理”(Surrogate)。这句话是对深度学习核心机制的一次“正本清源”。它打破了许多人心中“学习=最小化损失函数”的朴素认知。其核心思想是:我们追求的不是一个数字的最小化,而是一个高质量内部表征(representation)的形成。损失函数和优化器只是我们用来雕刻这个“表征”的刻刀和锤子。这个视角的转变意义重大:它鼓励研究者跳出对特定loss形式的执着,转而从“我们希望表征学习到什么样的数据结构”出发,去设计能产生理想梯度流的训练信号。这是从“术”的层面上升到“道”的层面,是理解表征学习的关键。

他解释道,损失函数的真正作用,是“产生一个梯度流(gradient flow),这个梯度流能够让这个表征(representation)往正确的方向走”。换言之,目标是学习到一个好的数据表征,而损失函数只是创造出实现这一目标所需驱动力的工具。只要最终产生的梯度流是相似的,即便使用形式上看起来千差万别的损失函数,也可能学到相似的优质表征。

这个观点将我们对模型训练的理解,从仅仅关注“降低一个数字(loss)”,提升到了关注“塑造一个结构(representation)”的更高维度。它也解释了为什么AI领域充满了各种看似奇怪却有效的损失函数设计——因为它们的核心都在于为参数优化提供正确方向的“力”,而非函数本身的形式。这背后,其实隐藏着一种对“美感”或“优雅”(Elegance)的隐性偏好,即神经网络在训练过程中,会内生地偏爱那些更简洁、更具压缩性的解释。
课代表立正:回到“记忆与泛化”的关系。给模型更多“记忆材料”,是否会提高泛化的可能性?

田渊栋:在许多任务中确实如此。看到的组合越多,模型就越能学到稳健的表征,这种表征对未见过的组合也具备预测能力,这就是泛化。真正的“理解”往往表现为方法论能力的提升,能在新情境下,用少量且简单的逻辑统一解释更多现象,并能推广到更多场景。 

课代表立正:如果数据很少,模型学不到好的表征,会发生什么?

田渊栋:它会倾向于记忆式学习,以满足训练误差的目标;但一旦超出训练集范围,错误率就会上升,人们往往会将其归因于过拟合或记忆主导。
Idea
标签的缺点,没有经过大量数据检验的标签不可靠。
You kind of ask yourself, is something fundamental or not fundamental? How things should be.
你会不断地问自己:什么东西是“基本的”、是“底层”的,什么不是?事物“应该是怎样的”?

I think that’s been guiding me a fair bit, thinking from multiple angles and looking for almost beauty, beauty and simplicity. Ugliness, there’s no room for ugliness. It’s beauty, simplicity, elegance, correct inspiration from the brain. All of those things need to be present at the same time. The more they are present, the more confident you can be in a top-down belief.
我觉得这些一直在很大程度上指导着我:从多个角度思考,同时去寻找某种“接近美感的东西”——美感、简洁性。丑陋的东西,是不该留下空间的。你要追求的是:美、简洁、优雅,以及来自大脑的“正确灵感”。这些要素需要同时出现,而且出现得越充分,你就越能在“自上而下的信念”(top-down belief)上有信心。

The top-down belief is the thing that sustains you when the experiments contradict you. Because if you trust the data all the time, well sometimes you can be doing the correct thing but there’s a bug. But you don’t know that there is a bug. How can you tell that there is a bug? How do you know if you should keep debugging or you conclude it’s the wrong direction? It’s the top-down. You can say things have to be this way. Something like this has to work, therefore we’ve got to keep going. That’s the top-down, and it’s based on this multifaceted beauty and inspiration by the brain.
这种“自上而下的信念”,就是当实验结果暂时和你唱反调时,支撑你继续往前走的东西。因为如果你永远只信任数据,那有时会出现这样一种情况:你做的是对的,但实验里有 bug,而你不知道那里有 bug。那你要如何判断,到底还要不要继续调试?是该说“这方向错了”,还是该说“系统里还有没找到的问题”?靠的就是这种 top-down 信念。你会对自己说:“事物必须是这样的,这种结构总得有一种方式是能工作的,所以我们得继续干下去。”这种 top-down 信念,正是建立在多维度的“美感”和“来自大脑的正确灵感”之上的。
Idea
头脑清晰是在大脑发展的早期就已经有一些简洁优雅的知识结构,在后面的人生中泛化到其他领域,必须是非常早的时期,两个方向(有安全感或者受恐惧困扰)都是自我强化的,简洁优雅如果是胜出的一方只可能出现在非常早的时期,可能是1岁以前,甚至是娘胎里都有可能。

Appendix.10.《Isaacson, Walter. Elon Musk (English Edition) (p. 365). Simon & Schuster. Kindle.》

I try to criticize the action, not the person. We all make mistakes. What matters is whether a person has a good feedback loop, can seek criticism from others, and can improve. Physics does not care about hurt feelings. It cares about whether you got the rocket right.
面对错误时有一条正常的反馈回路,能吸收他的批评和意见,能有所改善。

    热门主题

      • Recent Articles

      • 2007-02-28 Warren Buffett's Letters to Berkshire Shareholders

        Refer To:《2007-02-28 Warren Buffett's Letters to Berkshire Shareholders》。 To the Shareholders of Berkshire Hathaway Inc.: Our gain in net worth during 2006 was $16.9 billion, which increased the per-share book value of both our Class A and Class B ...
      • 2009-02-27 Warren Buffett's Letters to Berkshire Shareholders

        Refer To:《2009-02-27 Warren Buffett's Letters to Berkshire Shareholders》。 To the Shareholders of Berkshire Hathaway Inc.: Our decrease in net worth during 2008 was $11.5 billion, which reduced the per-share book value of both our Class A and Class B ...
      • 2025-10-14 Tracy Britt Cool.What I Learned Working With Buffett

        Refer To:《2025-10-14 Tracy Britt Cool.What I Learned Working With Buffett》。 INTRODUCTION 引言 SHANE PARRISH: Tracy, welcome to the show. SHANE PARRISH:Tracy,欢迎来到节目。 TRACY BRITT COOL: Thank you. It’s an honor to be here. I appreciate it. TRACY BRITT ...
      • 2010-05-26 Warren Buffett.Interview With FCIC

        INTERVIEWER: Thank you. Mr. Buffett we’re with the staff of the Financial Crisis Inquiry Commission. We were formed by Congress in 2009 to investigate the causes of the financial crisis both globally and domestically. And to do a report, due at the ...
      • 1988-02-29 Warren Buffett's Letters to Berkshire Shareholders

        Refer To:《1988-02-29 Warren Buffett's Letters to Berkshire Shareholders》。 To the Shareholders of Berkshire Hathaway Inc.: Our gain in net worth during 1987 was $464 million, or 19.5%. Over the last 23 years (that is, since present management took ...