2023-07-05 OpenAI.Introducing Superalignment

2023-07-05 OpenAI.Introducing Superalignment


We need scientific and technical breakthroughs to steer and control AI systems much smarter than us. To solve this problem within four years, we’re starting a new team, co-led by Ilya Sutskever and Jan Leike, and dedicating 20% of the compute we’ve secured to date to this effort. We’re looking for excellent ML researchers and engineers to join us.
我们需要科学和技术上的突破来引导和控制比我们聪明得多的 AI 系统。为了在四年内解决这个问题,我们正在成立一个由 Ilya Sutskever 和 Jan Leike 共同领导的新团队,并将把我们迄今为止获得的算力的 20% 用于这项工作。我们正在寻找优秀的机器学习研究员和工程师加入我们。

Superintelligence will be the most impactful technology humanity has ever invented, and could help us solve many of the world’s most important problems. But the vast power of superintelligence could also be very dangerous, and could lead to the disempowerment of humanity or even human extinction.
超级智能将成为人类有史以来最具影响力的技术,并可能帮助我们解决世界上许多最重要的问题。但超级智能的巨大力量也可能极其危险,可能导致人类被削弱权能,甚至导致人类灭绝。

While superintelligenceA seems far off now, we believe it could arrive this decade.
虽然现在看起来超级智能A还很遥远,但我们认为它可能会在本十年到来。

Managing these risks will require, among other things⁠, new institutions for governance⁠ and solving the problem of superintelligence alignment:
为了管理这些风险,除其他举措外,我们需要新的治理机构,并解决超级智能对齐(alignment)问题:

How do we ensure AI systems much smarter than humans follow human intent?
我们如何确保比人类聪明得多的 AI 系统遵循人类意图?

Currently, we don't have a solution for steering or controlling a potentially superintelligent AI, and preventing it from going rogue. Our current techniques for aligning AI, such as reinforcement learning from human feedback⁠, rely on humans’ ability to supervise AI. But humans won’t be able to reliably supervise AI systems much smarter than us,B and so our current alignment techniques will not scale to superintelligence. We need new scientific and technical breakthroughs.
目前,我们没有方法来引导或控制潜在的超级智能 AI,也无法防止其“脱缰”。我们当前用于对齐 AI 的技术(例如基于人类反馈的强化学习)依赖于人类对 AI 的监督能力。但人类将无法可靠地监督比我们聪明得多的 AI 系统,因此我们现有的对齐技术无法扩展到超级智能。我们需要新的科学和技术突破。
Idea
巴菲特有一套对齐的方法,是从人性最基本的层面(对于风险的敏感性)不断往外扩展。
Our approach
我们的方法

Our goal is to build a roughly human-level automated alignment researcher⁠. We can then use vast amounts of compute to scale our efforts, and iteratively align superintelligence.To align the first automated alignment researcher, we will need to 1) develop a scalable training method, 2) validate the resulting model, and 3) stress test our entire alignment pipeline:
我们的目标是构建一个大致达到人类水平的自动化对齐研究员。然后我们可以使用海量算力来扩展我们的工作,并迭代式地对齐超级智能。为了对齐第一个自动化对齐研究员,我们需要:1)开发可扩展的训练方法,2)验证所得模型,3)对整个对齐流程进行压力测试:

1.To provide a training signal on tasks that are difficult for humans to evaluate, we can leverage AI systems to assist evaluation of other AI systems⁠ (scalable oversight). In addition, we want to understand and control how our models generalize our oversight to tasks we can’t supervise (generalization).
对于人类难以评估的任务,我们可以利用 AI 系统来辅助评估其他 AI 系统(可扩展监督)。此外,我们希望理解并控制模型如何将我们的监督泛化到我们无法直接监督的任务上(泛化)。

2.To validate the alignment of our systems, we automate search for problematic behavior⁠(opens in a new window) (robustness) and problematic internals (automated interpretability⁠).
为了验证系统的对齐情况,我们将自动化搜索问题行为(稳健性)和问题内部机制(自动化可解释性)。

3.Finally, we can test our entire pipeline by deliberately training misaligned models, and confirming that our techniques detect the worst kinds of misalignments (adversarial testing).
最后,我们可以通过有意训练不对齐的模型来测试整个流程,并确认我们的技术能够检测到最严重类型的不对齐(对抗性测试)。

We expect our research priorities will evolve substantially as we learn more about the problem and we’ll likely add entirely new research areas. We are planning to share more on our roadmap in the future.
我们预计随着对问题理解的加深,我们的研究优先级将发生显著变化,并且很可能新增全新的研究方向。我们计划在未来分享更多路线图内容。

The new team
新团队

We are assembling a team of top machine learning researchers and engineers to work on this problem.
我们正在组建一支由顶尖机器学习研究员和工程师组成的团队来解决这个问题。

We are dedicating 20% of the compute we’ve secured to date over the next four years to solving the problem of superintelligence alignment. Our chief basic research bet is our new Superalignment team, but getting this right is critical to achieve our mission and we expect many teams to contribute, from developing new methods to scaling them up to deployment.
在未来四年内,我们将把迄今为止获得的算力的 20% 用于解决超级智能对齐问题。我们的基础研究主押注是新的 Superalignment 团队,但把这件事做好对实现我们的使命至关重要,我们预计将有许多团队做出贡献——从开发新方法到将其规模化并部署。

Our goal is to solve the core technical challenges of superintelligence alignment in four years.
我们的目标是在四年内解决超级智能对齐的核心技术挑战。

While this is an incredibly ambitious goal and we’re not guaranteed to succeed, we are optimistic that a focused, concerted effort can solve this problem:C There are many ideas that have shown promise in preliminary experiments, we have increasingly useful metrics for progress, and we can use today’s models to study many of these problems empirically.
尽管这是一个极其雄心勃勃的目标,也无法保证一定成功,但我们乐观地认为,专注且协同的努力可以解决这个问题:许多想法在初步实验中已展现出前景,我们拥有越来越有用的进展度量指标,而且我们可以使用当今的模型对其中许多问题进行实证研究。

Ilya Sutskever (cofounder and Chief Scientist of OpenAI) has made this his core research focus, and will be co-leading the team with Jan Leike (Head of Alignment). Joining the team are researchers and engineers from our previous alignment team, as well as researchers from other teams across the company.
Ilya Sutskever(OpenAI 联合创始人兼首席科学家)已将此作为他的核心研究重点,并将与 Jan Leike(Head of Alignment)共同领导团队。加入该团队的有我们此前对齐团队的研究人员与工程师,以及公司其他团队的研究人员。

We’re also looking for outstanding new researchers and engineers to join this effort. Superintelligence alignment is fundamentally a machine learning problem, and we think great machine learning experts—even if they’re not already working on alignment—will be critical to solving it.
我们也在寻找优秀的新研究员和工程师加入这项工作。超级智能对齐从根本上是一个机器学习问题,我们认为顶尖的机器学习专家——即使他们尚未从事对齐方向——也将是解决该问题的关键。

We plan to share the fruits of this effort broadly and view contributing to alignment and safety of non-OpenAI models as an important part of our work.
我们计划广泛分享这项工作的成果,并将为非 OpenAI 模型的对齐与安全做出贡献视为我们工作的重要组成部分。

This new team’s work is in addition to existing work at OpenAI aimed at improving the safety of current models⁠ like ChatGPT, as well as understanding and mitigating other risks from AI such as misuse, economic disruption, disinformation, bias and discrimination, addiction and overreliance, and others. While this new team will focus on the machine learning challenges of aligning superintelligent AI systems with human intent, there are related sociotechnical problems on which we are actively engaging with interdisciplinary experts⁠ to make sure our technical solutions consider broader human and societal concerns.
该新团队的工作是在 OpenAI 既有工作的基础上进行的,既有工作旨在提升当前模型(如 ChatGPT)的安全性,并理解与缓解 AI 带来的其他风险,如滥用、经济动荡、虚假信息、偏见与歧视、成瘾与过度依赖等。虽然新团队将专注于让超级智能 AI 系统与人类意图对齐的机器学习挑战,但对于相关的社会技术问题,我们也在积极与跨学科专家合作,以确保我们的技术方案能够考虑更广泛的人类与社会关切。

Join us
加入我们

Superintelligence alignment is one of the most important unsolved technical problems of our time. We need the world’s best minds to solve this problem.
超级智能对齐是我们这个时代最重要且尚未解决的技术问题之一。我们需要世界上最优秀的人才来解决这一问题。

If you’ve been successful in machine learning, but you haven’t worked on alignment before, this is your time to make the switch! We believe this is a tractable machine learning problem, and you could make enormous contributions.
如果你在机器学习领域已经取得成功,但此前没有从事过对齐方向的工作,现在正是转向的时刻!我们相信这是一个可求解的机器学习问题,而你可以做出巨大的贡献。

If you’re interested, we’d love to hear from you! Please apply for our research engineer⁠ and research scientist⁠ positions.
如果你感兴趣,我们非常期待你的来信!请申请我们的 research engineer 和 research scientist 职位。

    热门主题

      • Recent Articles

      • 2026-04-28 潘乱.从红果到AI短剧:谁在革谁的命?

        Refer To:《从红果到AI短剧:谁在革谁的命?》。 红果短剧的快速崛起与用户增长逻辑 红果短剧在三年内实现日活过亿的爆发式增长,主要得益于其免费模式和对非长视频用户的有效触达。与优爱腾等长视频平台偏向正剧的定位不同,短剧更接近于电影的消费体验,但通过广告变现降低了消费门槛。AI 漫剧作为新兴品类,在去年下半年开始崭露头角,虽然与传统大制作动漫路径不同,但其生产效率和题材丰富度正在迅速提升,成为行业新的增长点。 王小书: (00:04) Hmm. 潘乱: (00:04) ...
      • 2020-12-10 王宁.潮流玩具风靡背后的心理学

        Refer To:《泡泡玛特王宁:潮流玩具风靡背后的心理学》。 于近年来以Molly、Pucky、Dimoo等各类IP受到Z世代消费者欢迎的泡泡玛特,其实已经有十年历史。 “我从自己刷墙,开第一家实体店,做零售业,是在2008年5月13号,到这周末就是整整11年了。我们是创业老兵了,单泡泡玛特这个品牌就有9年。” ...
      • 2022-01-08 王宁.不做「你死我活」的生意

        Refer To:《泡泡玛特王宁:不做「你死我活」的生意》。 今年全球最火的玩具,非Labubu莫属。 6月11日,一只稀有款薄荷色Labubu以人民币108万元成交价在二级市场拍出。就是下面这只—— 图片 6月14日,因为韩国地区线下销售太火爆,恐引发安全问题,泡泡玛特发公告暂停Labubu全系列销售。 Labubu全球爆火直接拉动泡泡玛特股价飙涨,今年以来,其股价涨幅超过200%,市值超过3500亿元,创始人王宁也因此取代牧原股份秦英林,成为新晋河南首富。 ...
      • 2026-05-13 Alex Wang.Meta's AI Chief On AI Beef, New Models And Life With Zuck

        Refer To:《Meta's AI Chief On AI Beef, New Models And Life With Zuck》。 Meta Superintelligence Labs Structure and Strategic Compute Advantage Meta Superintelligence Labs 的组织结构与战略算力优势 Meta Superintelligence Labs (MSL) operates through a specialized ...
      • 2026-05-13 泡泡玛特.2026年股东大会问答记录

        Refer To:《Popmart股东大会万字实录:王宁回应一切》、《泡泡玛特 2026 年股东大会问答记录》。 美股财报相关的材料,比如,股东大会、季度会议的材料都非常完整,A股、港股在这方面的完善程度还远不如美股,泡泡玛特的这个股东大会的材料找了几个版本,还都停留在网友自己整理的材料。 问答 01:关于冰箱和小家电探索 股东提问: 公司如何看待推出冰箱等小家电产品? 王宁回答: ...