2025-07-15 Jason Wei.Asymmetry of verification and verifier’s law

2025-07-15 Jason Wei.Asymmetry of verification and verifier’s law


Asymmetry of verification is the idea that some tasks are much easier to verify than to solve. With reinforcement learning (RL) that finally works in a general sense, asymmetry of verification is becoming one of the most important ideas in AI.
验证的不对称性(Asymmetry of verification)指的是某些任务的正确性比解决该任务本身更容易验证。随着强化学习(Reinforcement Learning, RL)在通用意义上终于取得突破,验证不对称性正在成为人工智能领域最重要的思想之一。

Understanding asymmetry of verification through examples
通过例子理解验证不对称性

Asymmetry of verification is everywhere, if you look for it. Some prime examples:
验证不对称性无处不在,只要你愿意寻找,就能发现。以下是一些典型例子:

1.Sudoku and crossword puzzles take a lot of time to solve because you have to try many candidates against various constraints, but it is trivial to check if any given solution is correct.
数独和填字游戏因为需要在多种约束条件下尝试大量候选项,所以求解耗时较多,但检查一个给定答案是否正确却是非常容易的。

2.Writing the code to operate a website like instagram takes a team of engineers many years, but verifying whether the website is working properly can be done quickly by any layperson.
编写类似 Instagram 这样的网站代码需要一个工程团队数年时间,但判断网站是否正常运行却可以被普通人快速完成。

3.Solving BrowseComp problems often requires browsing hundreds of websites, but verifying any given answer can often be done much more quickly because you can directly search if the answer meets the constraints.
解决 BrowseComp 这类问题往往需要浏览上百个网站,但验证某个候选答案是否满足条件通常要快得多,因为你可以直接搜索答案是否符合约束。

Some tasks have near-symmetry of verification: they take a similar amount of time to verify as to write a solution. For example, verifying the answer to some math problems (e.g., adding two 900-digit numbers) often takes the same amount of work as solving the problem yourself. Another example is some data processing programs; following someone else’s code and verifying that it works takes just as long as writing the solution yourself.
有些任务则接近验证对称性:验证和解决所需时间差不多。例如,对于某些数学题(如两个 900 位数字相加),验证答案通常与自己解题所需的工作量相当。另一个例子是某些数据处理程序:阅读他人代码并验证其是否有效,往往与自己编写程序耗时一样。

Interestingly, there are also some tasks that can take way longer to verify than to propose a solution. For example, it might take longer to fact-check all the statements in an essay than to write that essay (cue Brandolini's law: “The amount of energy needed to refute bullshit is an order of magnitude bigger than that needed to produce it.”). Many scientific hypotheses are also harder to verify than to come up with. For example, it is easy to state a novel diet (“Eat only bison and broccoli”) but it would take years to verify whether the diet is beneficial for a general population.
有趣的是,也存在一些任务,其验证时间远远超过提出解决方案的时间。例如,核查一篇文章中的所有陈述可能比写这篇文章更费时(即 Brandolini 定律:“驳斥废话所需的能量比制造废话多一个数量级。”)。许多科学假说也是如此,提出一个新观点容易,例如“只吃野牛和西兰花的饮食法”,但要验证这种饮食是否对普通大众有益则可能需要多年时间。

Improving asymmetry of verification
改善验证不对称性

One of the most important realizations about asymmetry of verification is that it is possible to actually improve the asymmetry by front-loading some research about the task. For example, for a competition math problem, it is trivial to check any proposed final answer if you have the answer key at hand. Another great example is some coding problems: while it’s tedious to read code and check its correctness, if you have test cases with ample coverage, you can quickly check any given solution; indeed, this is what Leetcode does. In some tasks, it is possible to improve verification but not enough to make it trivial. As an example, for a problem like “Name a Dutch soccer player”, it would help to have a list of the famous Dutch soccer players but verification would still require work in many cases.
关于验证不对称性的一个重要认知是:通过预先做一些任务相关的研究,其实可以提升验证效率。例如,在数学竞赛题中,如果你手头有标准答案,那么检查一个候选答案就变得轻而易举。另一个典型例子是编程题:虽然阅读代码并检查其正确性比较繁琐,但如果你有覆盖全面的测试用例,那么你可以快速验证任何一个提交的解;这正是 Leetcode 所采用的方法。在某些任务中,也可以通过准备提升验证效率,但仍不足以使其变得简单。例如,“说出一位荷兰足球运动员”这类题目中,如果你拥有荷兰著名球员的列表会有所帮助,但在很多情况下,验证仍需一定工作量。

Verifier’s law
验证者定律

Why is asymmetry of verification important? If you consider the history of deep learning, we have seen that virtually anything that can be measured can be optimized. In RL terms, ability to verify solutions is equivalent to ability to create an RL environment. Hence, we have:
为什么验证不对称性如此重要?回顾深度学习的发展历史,我们会发现几乎任何可以被量化的事物都可以被优化。从强化学习(RL)的角度来看,验证一个解的能力就等同于构建一个 RL 环境的能力。因此,我们有如下结论:

Verifier’s law: The ease of training AI to solve a task is proportional to how verifiable the task is. All tasks that are possible to solve and easy to verify will be solved by AI.
验证者定律(Verifier’s law):训练 AI 解决某项任务的难易程度与该任务的可验证性成正比。所有可解且易验证的任务,最终都将被 AI 解决。
Idea
价值投资很难验证,结果的随机性很大,好的想法不等于有好的结果(只是提高了概率,但有时候这种概率很难验证,比如,全球财富榜上的成功者),坏的想法可能有好的结果。
More specifically, the ability to train AI to solve a task is proportional to whether the task has the following properties:
更具体地说,训练 AI 成功解决某项任务的可能性,与该任务是否具备以下属性密切相关:

1.Objective truth: everyone agrees what good solutions are
客观真值:人们普遍认同哪些是好解

2.Fast to verify: any given solution can be verified in a few seconds
快速验证:对任何给定的解,都可以在几秒钟内验证其正确性

3.Scalable to verify: many solutions can be verified simultaneously
可扩展验证:可以同时验证大量候选解

4.Low noise: verification is as tightly correlated to the solution quality as possible
低噪声:验证结果尽可能紧密地反映解的质量

5.Continuous reward: it’s easy to rank the goodness of many solutions for a single problem
连续奖励:可以轻松对一个问题的多个解按优劣进行排序

It’s not hard to believe that verifier’s law holds true: most benchmarks that have been proposed in AI are easy to verify and so far have been solved. Notice that virtually all popular benchmarks in the past ten years fit criteria #1-4; benchmarks that don’t meet criteria #1-4 would struggle to become popular. Note that although most benchmarks don’t fit criteria #5 (a solution is either strictly correct or not), you can compute a continuous reward by averaging the binary reward of many examples.
Verifier’s law 的有效性并不难以置信:AI 领域提出的大多数基准测试都易于验证,并且目前大多已被攻克。可以注意到,过去十年中流行的几乎所有基准测试都满足标准 #1 至 #4;不满足这些标准的基准测试很难流行开来。需要指出的是,尽管大多数基准测试并不满足标准 #5(答案通常是非黑即白的),但我们可以通过对大量示例的二元评分取平均,从而构造一个连续的奖励函数。

Why is verifiability so important? In my view, the most basic reason is that the amount of learning that occurs in neural networks is maximized when the above criteria are satisfied; you can take a lot of gradient steps where each step has a lot of signal. Speed of iteration is critical—it’s the reason that progress in the digital world has been so much faster than progress in the physical world.
为什么可验证性如此重要?在我看来,最根本的原因是,当上述标准被满足时,神经网络中的学习量能够被最大化——你可以进行大量梯度更新,每一步都富含信号。迭代速度至关重要,这正是数字世界的进步远快于物理世界的关键所在。

AlphaEvolve
阿尔法进化

Perhaps the greatest public example of leveraging asymmetry of verification in the past few years is AlphaEvolve, developed by Google. In short, AlphaEvolve can be seen as a very clever instantiation of guess-and-check that allows for ruthless optimization of an objective, which has resulted in several mathematical and operational innovations.
近几年最引人注目的、公开展示验证不对称性优势的例子,或许就是 Google 开发的 AlphaEvolve。简而言之,AlphaEvolve 可以被看作是一种极其巧妙的“猜测-检验”机制的实现,它能对目标函数进行极致优化,并由此带来了若干数学和工程上的创新。

A simple example of a problem optimized by AlphaEvolve is something like “Find the smallest outer hexagon that fits 11 unit hexagons.” Notice that this problem fits all five desirable properties of verifier’s law. Indeed, my belief is that any solvable problem that fits those five properties will be solved in the next few years.
AlphaEvolve 优化过的一个简单问题是:“找到一个能容纳 11 个单位六边形的最小外部六边形。”注意这个问题完全符合验证者定律的五项理想标准。事实上,我相信任何符合这五个特征且可解的问题,在未来几年内都将被 AI 解决。

One thing about the types of problems solved by AlphaEvolve is that it can be seen as “overfitting” to a single problem. In traditional machine learning, we already know the labels in the training set and the significant test was to measure generalization to unseen problems. However, in scientific innovation, we are in a totally different realm where we only care about solving a single problem (train=test!) because it’s an unsolved problem and potentially extremely valuable.
AlphaEvolve 所解决的问题类型有一个特点,那就是可以被视为对单一问题的“过拟合”。在传统机器学习中,我们已知训练集的标签,真正的考验是能否推广到未知问题上。但在科学创新领域,情况完全不同——我们只关心解决一个具体的问题(训练集=测试集!),因为这是一个未解的问题,且可能具有极高的价值。

Implications
影响与启示

Once you’ve learned about it, you’ll notice that asymmetry of verification is everywhere. It’s exciting to consider a world where anything we can measure will be solved. We will likely have a jagged edge of intelligence, where AI is much smarter at verifiable tasks because it’s so much easier to solve verifiable tasks. What an exciting future to consider.
一旦你意识到这一点,就会发现验证不对称性无处不在。想象一个只要能被测量就能被解决的世界,是令人振奋的。我们很可能会迎来一种“锯齿形”的智能边界:AI 在可验证任务上显得格外聪明,因为这些任务更容易被攻克。这个未来令人无比期待。

For more related reading, I liked [this blog post] by Alperen Keles.
如果你想进一步阅读相关内容,我推荐 Alperen Keles 写的[这篇博文]。

    热门主题

      • Recent Articles

      • 2026-04-28 潘乱.从红果到AI短剧:谁在革谁的命?

        Refer To:《从红果到AI短剧:谁在革谁的命?》。 红果短剧的快速崛起与用户增长逻辑 红果短剧在三年内实现日活过亿的爆发式增长,主要得益于其免费模式和对非长视频用户的有效触达。与优爱腾等长视频平台偏向正剧的定位不同,短剧更接近于电影的消费体验,但通过广告变现降低了消费门槛。AI 漫剧作为新兴品类,在去年下半年开始崭露头角,虽然与传统大制作动漫路径不同,但其生产效率和题材丰富度正在迅速提升,成为行业新的增长点。 王小书: (00:04) Hmm. 潘乱: (00:04) ...
      • 2020-12-10 王宁.潮流玩具风靡背后的心理学

        Refer To:《泡泡玛特王宁:潮流玩具风靡背后的心理学》。 于近年来以Molly、Pucky、Dimoo等各类IP受到Z世代消费者欢迎的泡泡玛特,其实已经有十年历史。 “我从自己刷墙,开第一家实体店,做零售业,是在2008年5月13号,到这周末就是整整11年了。我们是创业老兵了,单泡泡玛特这个品牌就有9年。” ...
      • 2022-01-08 王宁.不做「你死我活」的生意

        Refer To:《泡泡玛特王宁:不做「你死我活」的生意》。 今年全球最火的玩具,非Labubu莫属。 6月11日,一只稀有款薄荷色Labubu以人民币108万元成交价在二级市场拍出。就是下面这只—— 图片 6月14日,因为韩国地区线下销售太火爆,恐引发安全问题,泡泡玛特发公告暂停Labubu全系列销售。 Labubu全球爆火直接拉动泡泡玛特股价飙涨,今年以来,其股价涨幅超过200%,市值超过3500亿元,创始人王宁也因此取代牧原股份秦英林,成为新晋河南首富。 ...
      • 2026-05-13 Alex Wang.Meta's AI Chief On AI Beef, New Models And Life With Zuck

        Refer To:《Meta's AI Chief On AI Beef, New Models And Life With Zuck》。 Meta Superintelligence Labs Structure and Strategic Compute Advantage Meta Superintelligence Labs 的组织结构与战略算力优势 Meta Superintelligence Labs (MSL) operates through a specialized ...
      • 2026-05-13 泡泡玛特.2026年股东大会问答记录

        Refer To:《Popmart股东大会万字实录:王宁回应一切》、《泡泡玛特 2026 年股东大会问答记录》。 美股财报相关的材料,比如,股东大会、季度会议的材料都非常完整,A股、港股在这方面的完善程度还远不如美股,泡泡玛特的这个股东大会的材料找了几个版本,还都停留在网友自己整理的材料。 问答 01:关于冰箱和小家电探索 股东提问: 公司如何看待推出冰箱等小家电产品? 王宁回答: ...