Asymmetry of verification is the idea that some tasks are much easier to verify than to solve. With reinforcement learning (RL) that finally works in a general sense, asymmetry of verification is becoming one of the most important ideas in AI.
验证的不对称性(Asymmetry of verification)指的是某些任务的正确性比解决该任务本身更容易验证。随着强化学习(Reinforcement Learning, RL)在通用意义上终于取得突破,验证不对称性正在成为人工智能领域最重要的思想之一。
Understanding asymmetry of verification through examples
通过例子理解验证不对称性
Asymmetry of verification is everywhere, if you look for it. Some prime examples:
验证不对称性无处不在,只要你愿意寻找,就能发现。以下是一些典型例子:
1.Sudoku and crossword puzzles take a lot of time to solve because you have to try many candidates against various constraints, but it is trivial to check if any given solution is correct.
数独和填字游戏因为需要在多种约束条件下尝试大量候选项,所以求解耗时较多,但检查一个给定答案是否正确却是非常容易的。
2.Writing the code to operate a website like instagram takes a team of engineers many years, but verifying whether the website is working properly can be done quickly by any layperson.
编写类似 Instagram 这样的网站代码需要一个工程团队数年时间,但判断网站是否正常运行却可以被普通人快速完成。
3.Solving BrowseComp problems often requires browsing hundreds of websites, but verifying any given answer can often be done much more quickly because you can directly search if the answer meets the constraints.
解决 BrowseComp 这类问题往往需要浏览上百个网站,但验证某个候选答案是否满足条件通常要快得多,因为你可以直接搜索答案是否符合约束。
Some tasks have near-symmetry of verification: they take a similar amount of time to verify as to write a solution. For example, verifying the answer to some math problems (e.g., adding two 900-digit numbers) often takes the same amount of work as solving the problem yourself. Another example is some data processing programs; following someone else’s code and verifying that it works takes just as long as writing the solution yourself.
有些任务则接近验证对称性:验证和解决所需时间差不多。例如,对于某些数学题(如两个 900 位数字相加),验证答案通常与自己解题所需的工作量相当。另一个例子是某些数据处理程序:阅读他人代码并验证其是否有效,往往与自己编写程序耗时一样。
Interestingly, there are also some tasks that can take way longer to verify than to propose a solution. For example, it might take longer to fact-check all the statements in an essay than to write that essay (cue Brandolini's law: “The amount of energy needed to refute bullshit is an order of magnitude bigger than that needed to produce it.”). Many scientific hypotheses are also harder to verify than to come up with. For example, it is easy to state a novel diet (“Eat only bison and broccoli”) but it would take years to verify whether the diet is beneficial for a general population.
有趣的是,也存在一些任务,其验证时间远远超过提出解决方案的时间。例如,核查一篇文章中的所有陈述可能比写这篇文章更费时(即 Brandolini 定律:“驳斥废话所需的能量比制造废话多一个数量级。”)。许多科学假说也是如此,提出一个新观点容易,例如“只吃野牛和西兰花的饮食法”,但要验证这种饮食是否对普通大众有益则可能需要多年时间。
Improving asymmetry of verification
改善验证不对称性
One of the most important realizations about asymmetry of verification is that it is possible to actually improve the asymmetry by front-loading some research about the task. For example, for a competition math problem, it is trivial to check any proposed final answer if you have the answer key at hand. Another great example is some coding problems: while it’s tedious to read code and check its correctness, if you have test cases with ample coverage, you can quickly check any given solution; indeed, this is what Leetcode does. In some tasks, it is possible to improve verification but not enough to make it trivial. As an example, for a problem like “Name a Dutch soccer player”, it would help to have a list of the famous Dutch soccer players but verification would still require work in many cases.
关于验证不对称性的一个重要认知是:通过预先做一些任务相关的研究,其实可以提升验证效率。例如,在数学竞赛题中,如果你手头有标准答案,那么检查一个候选答案就变得轻而易举。另一个典型例子是编程题:虽然阅读代码并检查其正确性比较繁琐,但如果你有覆盖全面的测试用例,那么你可以快速验证任何一个提交的解;这正是 Leetcode 所采用的方法。在某些任务中,也可以通过准备提升验证效率,但仍不足以使其变得简单。例如,“说出一位荷兰足球运动员”这类题目中,如果你拥有荷兰著名球员的列表会有所帮助,但在很多情况下,验证仍需一定工作量。
Verifier’s law
验证者定律
Why is asymmetry of verification important? If you consider the history of deep learning, we have seen that virtually anything that can be measured can be optimized. In RL terms, ability to verify solutions is equivalent to ability to create an RL environment. Hence, we have:
为什么验证不对称性如此重要?回顾深度学习的发展历史,我们会发现几乎任何可以被量化的事物都可以被优化。从强化学习(RL)的角度来看,验证一个解的能力就等同于构建一个 RL 环境的能力。因此,我们有如下结论:
Verifier’s law: The ease of training AI to solve a task is proportional to how verifiable the task is. All tasks that are possible to solve and easy to verify will be solved by AI.
验证者定律(Verifier’s law):训练 AI 解决某项任务的难易程度与该任务的可验证性成正比。所有可解且易验证的任务,最终都将被 AI 解决。
价值投资很难验证,结果的随机性很大,好的想法不等于有好的结果(只是提高了概率,但有时候这种概率很难验证,比如,全球财富榜上的成功者),坏的想法可能有好的结果。
More specifically, the ability to train AI to solve a task is proportional to whether the task has the following properties:
更具体地说,训练 AI 成功解决某项任务的可能性,与该任务是否具备以下属性密切相关:
1.Objective truth: everyone agrees what good solutions are
客观真值:人们普遍认同哪些是好解
2.Fast to verify: any given solution can be verified in a few seconds
快速验证:对任何给定的解,都可以在几秒钟内验证其正确性
3.Scalable to verify: many solutions can be verified simultaneously
可扩展验证:可以同时验证大量候选解
4.Low noise: verification is as tightly correlated to the solution quality as possible
低噪声:验证结果尽可能紧密地反映解的质量
5.Continuous reward: it’s easy to rank the goodness of many solutions for a single problem
连续奖励:可以轻松对一个问题的多个解按优劣进行排序
It’s not hard to believe that verifier’s law holds true: most benchmarks that have been proposed in AI are easy to verify and so far have been solved. Notice that virtually all popular benchmarks in the past ten years fit criteria #1-4; benchmarks that don’t meet criteria #1-4 would struggle to become popular. Note that although most benchmarks don’t fit criteria #5 (a solution is either strictly correct or not), you can compute a continuous reward by averaging the binary reward of many examples.
Verifier’s law 的有效性并不难以置信:AI 领域提出的大多数基准测试都易于验证,并且目前大多已被攻克。可以注意到,过去十年中流行的几乎所有基准测试都满足标准 #1 至 #4;不满足这些标准的基准测试很难流行开来。需要指出的是,尽管大多数基准测试并不满足标准 #5(答案通常是非黑即白的),但我们可以通过对大量示例的二元评分取平均,从而构造一个连续的奖励函数。
Why is verifiability so important? In my view, the most basic reason is that the amount of learning that occurs in neural networks is maximized when the above criteria are satisfied; you can take a lot of gradient steps where each step has a lot of signal. Speed of iteration is critical—it’s the reason that progress in the digital world has been so much faster than progress in the physical world.
为什么可验证性如此重要?在我看来,最根本的原因是,当上述标准被满足时,神经网络中的学习量能够被最大化——你可以进行大量梯度更新,每一步都富含信号。迭代速度至关重要,这正是数字世界的进步远快于物理世界的关键所在。
AlphaEvolve
阿尔法进化
Perhaps the greatest public example of leveraging asymmetry of verification in the past few years is AlphaEvolve, developed by Google. In short, AlphaEvolve can be seen as a very clever instantiation of guess-and-check that allows for ruthless optimization of an objective, which has resulted in several mathematical and operational innovations.
近几年最引人注目的、公开展示验证不对称性优势的例子,或许就是 Google 开发的 AlphaEvolve。简而言之,AlphaEvolve 可以被看作是一种极其巧妙的“猜测-检验”机制的实现,它能对目标函数进行极致优化,并由此带来了若干数学和工程上的创新。
A simple example of a problem optimized by AlphaEvolve is something like “Find the smallest outer hexagon that fits 11 unit hexagons.” Notice that this problem fits all five desirable properties of verifier’s law. Indeed, my belief is that any solvable problem that fits those five properties will be solved in the next few years.
AlphaEvolve 优化过的一个简单问题是:“找到一个能容纳 11 个单位六边形的最小外部六边形。”注意这个问题完全符合验证者定律的五项理想标准。事实上,我相信任何符合这五个特征且可解的问题,在未来几年内都将被 AI 解决。
One thing about the types of problems solved by AlphaEvolve is that it can be seen as “overfitting” to a single problem. In traditional machine learning, we already know the labels in the training set and the significant test was to measure generalization to unseen problems. However, in scientific innovation, we are in a totally different realm where we only care about solving a single problem (train=test!) because it’s an unsolved problem and potentially extremely valuable.
AlphaEvolve 所解决的问题类型有一个特点,那就是可以被视为对单一问题的“过拟合”。在传统机器学习中,我们已知训练集的标签,真正的考验是能否推广到未知问题上。但在科学创新领域,情况完全不同——我们只关心解决一个具体的问题(训练集=测试集!),因为这是一个未解的问题,且可能具有极高的价值。
Implications
影响与启示
Once you’ve learned about it, you’ll notice that asymmetry of verification is everywhere. It’s exciting to consider a world where anything we can measure will be solved. We will likely have a jagged edge of intelligence, where AI is much smarter at verifiable tasks because it’s so much easier to solve verifiable tasks. What an exciting future to consider.
一旦你意识到这一点,就会发现验证不对称性无处不在。想象一个只要能被测量就能被解决的世界,是令人振奋的。我们很可能会迎来一种“锯齿形”的智能边界:AI 在可验证任务上显得格外聪明,因为这些任务更容易被攻克。这个未来令人无比期待。
如果你想进一步阅读相关内容,我推荐 Alperen Keles 写的[这篇博文]。