2023-07-05 OpenAI.Introducing Superalignment

Refer To：《Introducing Superalignment》。

We need scientific and technical breakthroughs to steer and control AI systems much smarter than us. To solve this problem within four years, we’re starting a new team, co-led by Ilya Sutskever and Jan Leike, and dedicating 20% of the compute we’ve secured to date to this effort. We’re looking for excellent ML researchers and engineers to join us.

我们需要科学和技术上的突破来引导和控制比我们聪明得多的 AI 系统。为了在四年内解决这个问题，我们正在成立一个由 Ilya Sutskever 和 Jan Leike 共同领导的新团队，并将把我们迄今为止获得的算力的 20% 用于这项工作。我们正在寻找优秀的机器学习研究员和工程师加入我们。

Superintelligence will be the most impactful technology humanity has ever invented, and could help us solve many of the world’s most important problems. But the vast power of superintelligence could also be very dangerous, and could lead to the disempowerment of humanity or even human extinction.

超级智能将成为人类有史以来最具影响力的技术，并可能帮助我们解决世界上许多最重要的问题。但超级智能的巨大力量也可能极其危险，可能导致人类被削弱权能，甚至导致人类灭绝。

While superintelligenceA seems far off now, we believe it could arrive this decade.

虽然现在看起来超级智能A还很遥远，但我们认为它可能会在本十年到来。

Managing these risks will require, among other things⁠, new institutions for governance⁠ and solving the problem of superintelligence alignment:

为了管理这些风险，除其他举措外，我们需要新的治理机构，并解决超级智能对齐（alignment）问题：

How do we ensure AI systems much smarter than humans follow human intent?

我们如何确保比人类聪明得多的 AI 系统遵循人类意图？

Currently, we don't have a solution for steering or controlling a potentially superintelligent AI, and preventing it from going rogue. Our current techniques for aligning AI, such as reinforcement learning from human feedback⁠, rely on humans’ ability to supervise AI. But humans won’t be able to reliably supervise AI systems much smarter than us,B and so our current alignment techniques will not scale to superintelligence. We need new scientific and technical breakthroughs.

目前，我们没有方法来引导或控制潜在的超级智能 AI，也无法防止其“脱缰”。我们当前用于对齐 AI 的技术（例如基于人类反馈的强化学习）依赖于人类对 AI 的监督能力。但人类将无法可靠地监督比我们聪明得多的 AI 系统，因此我们现有的对齐技术无法扩展到超级智能。我们需要新的科学和技术突破。

巴菲特有一套对齐的方法，是从人性最基本的层面（对于风险的敏感性）不断往外扩展。

Our approach

我们的方法

Our goal is to build a roughly human-level automated alignment researcher⁠. We can then use vast amounts of compute to scale our efforts, and iteratively align superintelligence.To align the first automated alignment researcher, we will need to 1) develop a scalable training method, 2) validate the resulting model, and 3) stress test our entire alignment pipeline:

我们的目标是构建一个大致达到人类水平的自动化对齐研究员。然后我们可以使用海量算力来扩展我们的工作，并迭代式地对齐超级智能。为了对齐第一个自动化对齐研究员，我们需要：1）开发可扩展的训练方法，2）验证所得模型，3）对整个对齐流程进行压力测试：

1.To provide a training signal on tasks that are difficult for humans to evaluate, we can leverage AI systems to assist evaluation of other AI systems⁠ (scalable oversight). In addition, we want to understand and control how our models generalize our oversight to tasks we can’t supervise (generalization).

对于人类难以评估的任务，我们可以利用 AI 系统来辅助评估其他 AI 系统（可扩展监督）。此外，我们希望理解并控制模型如何将我们的监督泛化到我们无法直接监督的任务上（泛化）。

2.To validate the alignment of our systems, we automate search for problematic behavior⁠(opens in a new window) (robustness) and problematic internals (automated interpretability⁠).

为了验证系统的对齐情况，我们将自动化搜索问题行为（稳健性）和问题内部机制（自动化可解释性）。

3.Finally, we can test our entire pipeline by deliberately training misaligned models, and confirming that our techniques detect the worst kinds of misalignments (adversarial testing).

最后，我们可以通过有意训练不对齐的模型来测试整个流程，并确认我们的技术能够检测到最严重类型的不对齐（对抗性测试）。

We expect our research priorities will evolve substantially as we learn more about the problem and we’ll likely add entirely new research areas. We are planning to share more on our roadmap in the future.

我们预计随着对问题理解的加深，我们的研究优先级将发生显著变化，并且很可能新增全新的研究方向。我们计划在未来分享更多路线图内容。

The new team

新团队

We are assembling a team of top machine learning researchers and engineers to work on this problem.

我们正在组建一支由顶尖机器学习研究员和工程师组成的团队来解决这个问题。

We are dedicating 20% of the compute we’ve secured to date over the next four years to solving the problem of superintelligence alignment. Our chief basic research bet is our new Superalignment team, but getting this right is critical to achieve our mission and we expect many teams to contribute, from developing new methods to scaling them up to deployment.

在未来四年内，我们将把迄今为止获得的算力的 20% 用于解决超级智能对齐问题。我们的基础研究主押注是新的 Superalignment 团队，但把这件事做好对实现我们的使命至关重要，我们预计将有许多团队做出贡献——从开发新方法到将其规模化并部署。

Our goal is to solve the core technical challenges of superintelligence alignment in four years.

我们的目标是在四年内解决超级智能对齐的核心技术挑战。

While this is an incredibly ambitious goal and we’re not guaranteed to succeed, we are optimistic that a focused, concerted effort can solve this problem:C There are many ideas that have shown promise in preliminary experiments, we have increasingly useful metrics for progress, and we can use today’s models to study many of these problems empirically.

尽管这是一个极其雄心勃勃的目标，也无法保证一定成功，但我们乐观地认为，专注且协同的努力可以解决这个问题：许多想法在初步实验中已展现出前景，我们拥有越来越有用的进展度量指标，而且我们可以使用当今的模型对其中许多问题进行实证研究。

Ilya Sutskever (cofounder and Chief Scientist of OpenAI) has made this his core research focus, and will be co-leading the team with Jan Leike (Head of Alignment). Joining the team are researchers and engineers from our previous alignment team, as well as researchers from other teams across the company.

Ilya Sutskever（OpenAI 联合创始人兼首席科学家）已将此作为他的核心研究重点，并将与 Jan Leike（Head of Alignment）共同领导团队。加入该团队的有我们此前对齐团队的研究人员与工程师，以及公司其他团队的研究人员。

We’re also looking for outstanding new researchers and engineers to join this effort. Superintelligence alignment is fundamentally a machine learning problem, and we think great machine learning experts—even if they’re not already working on alignment—will be critical to solving it.

我们也在寻找优秀的新研究员和工程师加入这项工作。超级智能对齐从根本上是一个机器学习问题，我们认为顶尖的机器学习专家——即使他们尚未从事对齐方向——也将是解决该问题的关键。

We plan to share the fruits of this effort broadly and view contributing to alignment and safety of non-OpenAI models as an important part of our work.

我们计划广泛分享这项工作的成果，并将为非 OpenAI 模型的对齐与安全做出贡献视为我们工作的重要组成部分。

This new team’s work is in addition to existing work at OpenAI aimed at improving the safety of current models⁠ like ChatGPT, as well as understanding and mitigating other risks from AI such as misuse, economic disruption, disinformation, bias and discrimination, addiction and overreliance, and others. While this new team will focus on the machine learning challenges of aligning superintelligent AI systems with human intent, there are related sociotechnical problems on which we are actively engaging with interdisciplinary experts⁠ to make sure our technical solutions consider broader human and societal concerns.

该新团队的工作是在 OpenAI 既有工作的基础上进行的，既有工作旨在提升当前模型（如 ChatGPT）的安全性，并理解与缓解 AI 带来的其他风险，如滥用、经济动荡、虚假信息、偏见与歧视、成瘾与过度依赖等。虽然新团队将专注于让超级智能 AI 系统与人类意图对齐的机器学习挑战，但对于相关的社会技术问题，我们也在积极与跨学科专家合作，以确保我们的技术方案能够考虑更广泛的人类与社会关切。

Join us

加入我们

Superintelligence alignment is one of the most important unsolved technical problems of our time. We need the world’s best minds to solve this problem.

超级智能对齐是我们这个时代最重要且尚未解决的技术问题之一。我们需要世界上最优秀的人才来解决这一问题。

If you’ve been successful in machine learning, but you haven’t worked on alignment before, this is your time to make the switch! We believe this is a tractable machine learning problem, and you could make enormous contributions.

如果你在机器学习领域已经取得成功，但此前没有从事过对齐方向的工作，现在正是转向的时刻！我们相信这是一个可求解的机器学习问题，而你可以做出巨大的贡献。

If you’re interested, we’d love to hear from you! Please apply for our research engineer⁠ and research scientist⁠ positions.

如果你感兴趣，我们非常期待你的来信！请申请我们的 research engineer 和 research scientist 职位。

2023-07-05 OpenAI.Introducing Superalignment

2023-07-05 OpenAI.Introducing Superalignment

热门主题

Recent Articles

2007-09-12 Max Olson.See's Candies

1995-05-01 Berkshire Hathaway Annual Meeting

2025-10-07 Constellation Brands, Inc. (STZ) Q2 2026 Earnings Call Transcript

I.H.181.Warren Buffett.Alignment

I.H.189.Warren Buffett.Water the flowers and skip over the weeds.