Kevin Weil:
We're in this transition from ChatGPT being a thing that answers questions to a product that actually goes and does tasks for you in the real world. You should be in control of any actions that it takes.
我们正处于一个过渡阶段,ChatGPT 正从能够回答问题的工具,发展为可以在现实世界中替你执行任务的产品。你应该能够掌控它所执行的任何操作。
The best way to do that is to kind of co-evolve together.
实现这一目标的最佳方式是彼此共同进化。
Speaker 2:
Sam has sort of talked a little bit about generational differences. So what is it like if you're in your early 20s, you're using ChatGPT?
Sam 提到过一些代际差异。如果你二十出头并使用 ChatGPT,会是什么体验?
Kevin Weil:
If you're younger, it's just a core part of the way you operate your life. There's sort of an always-on nature. You realize that you have this super assistant in your pocket that can not just answer any question, but it can teach you anything that you want to learn.
如果你更年轻,ChatGPT 就是你生活方式的核心组成部分,它几乎始终在线。你会意识到兜里有一个超级助手,不仅能回答任何问题,而且还能教会你想学的任何知识。
Speaker 2:
How far out are model capabilities being delivered? Are there things that are being worked on today that might not make their way into models until.
模型能力的交付周期有多长?是否存在当前正在研发、却要到……才会集成到模型中的功能?
Kevin Weil:
Mid-2026? It's unpredictable. Sometimes things take longer than expected. Other times you see these capabilities that you didn't expect at all that are kind of emergent and all of a sudden something just works.
2026 年年中?这很难预测。有时事情会比预期更久,有时则会出现完全没想到的涌现能力,突然之间某件事就能运作了。
It's a completely different way of building products. Computers can like do things that they couldn't do two months ago and we're constantly in that state.
这是一种完全不同的产品构建方式。计算机能够在两个月前还做不到的事情,如今却能做到,我们始终处于这种状态中。
Speaker 2:
Today, I'm so excited to welcome Kevin Weil. He is the Chief Product Officer of OpenAI, coming off the back of a storied career in companies like Twitter and Facebook. So much to talk about.
今天我非常高兴欢迎 Kevin Weil。他是 OpenAI 的首席产品官,此前在 Twitter、Facebook 等公司拥有辉煌的职业经历。我们有很多话题要谈。
So first of all, Kevin, thank you for making the time. You've been busy today already. It's morning in San Francisco and you've already launched some things.
首先,Kevin,感谢你抽时间。你今天已经很忙了。现在是旧金山的早晨,而你们已经发布了一些新东西。
Kevin Weil:
Yes, we have. And thank you so much for having me. It's my first live substack, so I'm excited.
是的,确实如此。非常感谢邀请。这是我第一次 Substack 直播,我很激动。
Speaker 2:
Well, hopefully many more to come. You are really keeping us on your toes. And so just an hour before we went live, there was another spate of product launches from OpenAI.
希望以后还能有更多机会。你们真的让我们保持警觉。就在我们直播前一小时,OpenAI 又发布了一连串新产品。
What do they tell us about your vision for ChatGPT and where the product is going?
这些发布向我们传达了 ChatGPT 的什么愿景,以及产品的发展方向?
Kevin Weil:
The biggest thing that we launched this morning, we launched like six different things this morning,
今天早上我们发布的最重要内容——事实上我们一口气发布了六项——
but I think the most important one kind of for the long-term future of AI is we launched a series of connectors that can connect to your either personal data or if you're an enterprise, your enterprise data.
但我认为对 AI 长远未来最关键的是,我们推出了一系列连接器,可连接到你的个人数据,或若你是企业,则连接到企业数据。
So this is connectors into Google Docs to Gmail and Calendar to SharePoint to OneDrive. Dropbox, Box, Linear, all of these different tools that you use every day to get things done.
这些连接器覆盖 Google Docs、Gmail、Calendar、SharePoint、OneDrive、Dropbox、Box、Linear 等你每天用来完成工作的各种工具。
With the rise of our reasoning models, connecting them into the services and the data that you use helps the models be way more useful. So it's not just that, you know, you can now ask, you know, at work,
随着我们推理模型能力的提升,将它们接入你所使用的服务和数据会让模型大幅提升实用性。因此,不仅仅是在工作中你可以提问,
for example, if you connected in for your Google Docs or SharePoint, You connect it into the docs that you use every day. Suddenly,
例如,将 Google Docs 或 SharePoint 接入后,你每天使用的文档就全部连通了。突然之间,
ChatGPT can get all of this context on the enterprise and on the state of projects and on the latest about any particular thing that's going on. You wouldn't ever have an employee at your company that you didn't give access to docs.
ChatGPT 就能获得企业上下文、项目状态以及任何正在发生事项的最新信息。你不会让公司里的员工无法访问文档,
It's like where the conversation and the strategy and other things are happening. So now you have ChatGPT that has the ability to do it.
因为那是讨论、战略等内容发生的地方。现在 ChatGPT 也具备了这种能力。
And today, they're read-only, so they're able to access the information but not, you know, create on the other end. But you can imagine going forward, another big part of this is that ChatGPT should be able to take actions,
目前,这些连接器是只读的,因此它们只能访问信息,不能在另一端创建内容。但你可以想象,未来的另一大关键是 ChatGPT 应该能够采取行动,
should be able to help you write a document or create a presentation or, you know, write tasks to your task management system and ultimately combine all of these together to begin really working like an employee.
能够帮助你撰写文档、制作演示,或向任务管理系统写入任务,并最终把这一切结合起来,真正开始像一名员工那样工作。
Speaker 2:
Right, so that's the hint, right? That direction of moving ChatGPT from something we interact with, with a challenge and response, into something that is more evolved and feels like it's really doing work for us as an employee.
Speaker 2:
没错,这就是线索,对吧?这说明了一个方向:把 ChatGPT 从我们以挑战—回应方式互动的工具,转变成一个更进化、真正在为我们工作的“员工”。
We're going to spend some time digging into that. It's such a big vision. Let me ask about you. You're the Chief Product Officer of what I think is the most important company in the world today.
接下来我们会花些时间深入探讨这一点。这是一个宏大的愿景。让我先问问你:你是我认为当今世界最重要公司之一的首席产品官。
So how does that feel like to you day-to-day and week-to-week?
那么在日常和每周的工作中,这种感觉对你来说是什么样的?
Kevin Weil:
I mean, it's a privilege. It's the most exciting place I've ever worked and I've been very fortunate in my career to work at a bunch of awesome places with really great co-workers, but I think You know, the opportunity in front of us,the rate at which AI is changing all of our lives means we have an ability to make a really big impact and we take that super seriously. So I get to work with amazing colleagues. I get to kind of, you know,have a front row seat to the way that our models are evolving and that the way this is all impacting our lives and hopefully we get to build some products that make a difference in your life and in the lives of everybody listening here.
我觉得这是一种荣幸。这里是我工作过的最令人兴奋的地方,而我职业生涯中也很幸运,曾在多家出色的公司与优秀的同事共事。不过我认为,摆在我们面前的机遇,以及 AI 改变我们生活的速度,都意味着我们能够产生巨大的影响,而我们对此十分认真。我能与卓越的同事合作,近距离见证模型的演进以及这些变化如何影响我们的生活,并希望我们能打造出真正改善你和所有听众生活的产品。
Speaker 2:
Well, nearly a billion people have made, at least voted with their mouse finger to use ChatGPT just since November 2022. Technology always changes us. The television did. We got TV dinners and we got water cooler discussions.
自 2022 年 11 月以来,已有近十亿人用鼠标投票选择使用 ChatGPT。技术总会改变我们——电视如此:于是有了电视餐,也有了茶水间闲聊。
The internet obviously changed us in different ways, cars as well. We got big box retail. We got the suburbs. We got the picket fences, desperate housewives. How might AI products reshape our everyday life?
互联网显然以不同方式改变了我们,汽车亦然——有了大型零售商店、郊区、栅栏篱笆,甚至《绝望主妇》。那么,AI 产品将如何重塑我们的日常生活?
Kevin Weil:
Well, I think one of the other interesting things you get when you have these waves of technology is you start by doing the thing that you were doing with the previous wave, but sort of with the new medium.
嗯,我认为在每一波技术浪潮中都会出现一个有趣现象:人们往往先用新媒介去做他们在上一波技术里做过的事情。
You know, the first TV advertisements were people standing on a stage reading their radio advertisements. And then people slowly figure out that you can actually, you know, do what we have as commercials today,
你知道,最早的电视广告就是有人站在台上朗读广播广告。后来人们慢慢意识到,可以做成今天这种商业广告,
which are much more interactive and sort of dynamic. And so, you know, we're still probably in that mode of people, when they look at the impact that AI can have in their lives or in their work,
它们更加互动、更具动态性。因此,如今当人们审视 AI 对生活或工作的影响时,
they're kind of like, OK, I have these processes. How do I, you know, sprinkle AI on top of this to make it better, faster, etc.? And like, that's fine. That's all good.
他们会想:好的,我有这些流程,我该怎样“撒点 AI”让它们更好、更快等等?这没问题,挺好的。
As in previous technology transitions, the power comes from completely reimagining the work that you're doing from first principles using the new technology. So mobile wasn't just about a computer in your pocket.
但正如以往的技术变革,真正的力量来自于运用新技术,从第一性原理彻底重新构想你正在做的工作。移动时代的核心并不仅仅是把一台电脑塞进你的口袋。
You have access to GPS and you have notifications and totally new ways to interact with technology. I think over the next year, we're all going to be in the process of reinventing the way that we do things with AI.
它让你能使用 GPS、接收通知,并以全新的方式与技术互动。我认为在接下来的一年里,我们都会着手用 AI 重新定义自己的工作方式。
And the fun part is the technology is moving so fast. It's not, I mean, faster than any technology I've ever worked with before I've ever seen in my career. So even as we're reinventing, the technology is gaining new capabilities.
有趣的是,这项技术进步得如此之快——比我职业生涯中见过的任何技术都要快。即便我们在重新发明,技术本身也在不断获得新能力。
And it's just it's an exciting time to be alive.
这确实是一个令人兴奋的时代。
Speaker 2:
I mean, is that that rate of change is quite something I did. I was looking for something today, which I know would not have been possible. Four months ago, so I was using O3 to help me find a little portable mobile 5G router for when I travel and I just need to get You know, good quality signal.
我的意思是,那种变化速度真的令人惊叹。我今天在找一样东西,而我知道在四个月前这根本不可能做到,所以我用 O3 帮我找一款便携式移动 5G 路由器,方便我出差时获取高质量信号。
And of course, O3 went into all the technical specifications of the radio frequency chipset and said, well, you know, this earlier router has got a slightly better Qualcomm chipset they don't know, no longer use.
当然,O3 详细列出了射频芯片组的所有技术规格,并告诉我这款早期路由器使用的是稍微更好的高通芯片,但现在他们已经不用了。
And you'll get one extra signal part in rural environments in the US, but not in Europe. And I'm thinking that again, this is kind of insane. But I'm really curious about those types of behaviors.
在美国的农村地区,你会多获得一个信号分段,但在欧洲则不会。我当时又觉得这有点疯狂,但我很好奇这些行为模式。
What are the surprising behaviors that you've seen emerge around ChatGPT from your users? Sam has sort of talked a little bit about generational differences,
在你们的用户中,围绕 ChatGPT 出现了哪些令人惊讶的行为?Sam 提到了一些世代差异,
but give us a flavor of things that you've learned about behavior that are just really hard for product teams to research or figure out.
但请给我们分享一些你们了解到、而产品团队很难研究或弄清楚的行为特征。
Kevin Weil:
Well, I do think, I mean, Sam has talked about the way that, like I said, the people today, I think a lot of people are sort of sprinkling AI onto their existing workflows. And it's young people that are, you know, they're sort of, this is native to them in a way that it's not native to, you know, those of us who grew up without AI. And it's, I mean, my kids just, they're like, of course you can talk to a super powerful AI that,
我确实认为——正如 Sam 所说——如今很多人只是把 AI “撒” 在现有流程上。而年轻人则……可以说,对他们来说 AI 是原生的,而对我们这些没有 AI 环境长大的人则并非如此。比如我孩子就觉得,当然可以与一个超级强大的 AI 交流,
like, you know, can customize itself and answer any question you have. Kids that are graduating college these days as engineers, they don't know any other way to write code than using AI editors like Cursor and Windsurf.
它能自我定制并回答你所有问题。现在大学毕业的工程师孩子们,只知道用 Cursor、Windsurf 之类的 AI 编辑器写代码,别无他法。
It's just completely natural to them and so it gives them superpowers in a way. It's one of the reasons that we look a lot at how our youngest users,
这一切对他们来说再自然不过,也因此赋予了他们“超能力”。这就是我们非常关注最年轻用户——
our young users, late teens, college users, things like that are using the product because it teaches us a bunch.
例如十几岁、高校阶段的用户——如何使用产品的原因之一,因为他们能教会我们很多东西。
Speaker 2:
Can you characterize those differences though? So what is it like if you're in your early 20s, you're using ChatGPT, how does it feel different to somebody who might be in their mid-30s or early 40s?
那能否描述一下这些差异?如果你二十出头在使用 ChatGPT,与三十五岁到四十岁的人相比,体验有何不同?
Kevin Weil:
There's sort of an always-on nature. You realize that you have this super assistant in your pocket that It can not just answer any question, but it can teach you anything that you want to learn. And so when you go about life that way,
它有一种“始终在线”的特性。你意识到兜里有个超级助手,不仅能回答任何问题,还能教你想学的任何东西。当你以这种方式生活时,
the rest of us are like trying to remember, trying to think about the processes that we go through and how we can reimagine them. If you're younger, you might not have had those processes. And so you built them from the ground up with AI.
我们其他人则试图回想并思考自己所经历的流程,以及如何重新构想它们。若你更年轻,可能根本没有这些旧流程,于是你从零开始用 AI 构建它们。
And so it's just a core part of the way you operate your life. And in some ways, they're ahead and the rest of us are catching up.
因此,AI 已成为他们生活运作方式的核心组成部分。在某些方面,他们处于领先地位,而我们其他人仍在追赶。
Speaker 2:
A lot of people are scared of this technology. A lot of people are nervous. I've just come back from Brussels and spoken to people in a different sector, and you do get that sense of fear. What could change in the product to address that?
许多人对这项技术感到害怕、紧张。我刚从布鲁塞尔回来,跟其他领域的人交谈,也能感受到这种恐惧。产品上可以做哪些改变来缓解这种担忧?
Kevin Weil:
One thing that I think is really important is people ask, you know, what should I do about AI? How should I think about it? And my answer is always just use it. And you know, of course, I think everybody should try ChatGPT,
我认为很重要的一点是,人们常问:“我应该怎么做?该如何看待 AI?”我的答案总是:直接用就对了。当然,我觉得每个人都应该尝试 ChatGPT,
but whether it's us or any of the others, just start using it because it's the number one way to realize it's not this super scary thing that you read about. It just helps you get more done and you're like, oh, this is great.
但无论是我们还是其他平台,都该开始使用。因为这是最直接的方式去意识到它并不像报道里那样可怕,它只是帮你完成更多工作,让你觉得:“哦,这太棒了。”
I now have this incredible new thing that can be part of my life and help me get things done and help me automate a bunch of the boring work that I have. So the number one thing is just like, Just start using it.
我现在拥有了一个令人难以置信的新工具,它可以融入我的生活,帮我完成任务并自动化许多枯燥工作。所以最重要的就是:开始用。
Also, given the rate that the technology is improving, if you don't start using it now, it's going to be even harder to sort of catch on to it. And if you believe that AI is going to be a big part of our lives,
此外,考虑到技术进步如此之快,如果你现在不开始使用,日后上手只会更难。如果你相信 AI 将深度融入我们的生活,
that's not a train you want to miss. So, you know, that's the number one thing. Just start using it. But then, you know, we think a lot, especially as you get towards agents and other things,
那就别错过这班车。所以,最重要的就是开始用。不过,我们也在深思,尤其是当我们迈向代理等功能时,
we think a lot about making sure that the user is in control. So you don't want the, you know, as you're getting to use ChatGPT or any other AI, you don't want it to go off and do a bunch of things for you without you,
我们要确保用户保持掌控。你并不想让 ChatGPT 或其他 AI 在未获得你允许的情况下替你做一堆事,
you know, feeling in control. And so, you know, it's one thing if it's like answering a question for you, reading some docs and summarizing them, things like that.
让你失去控制感。如果只是替你回答问题、阅读文档并做摘要之类,那还好。
But as we're in this transition from ChatGPT being a thing that OpenAI answers questions to a product that actually goes and does tasks for you in the real world. And you should be in control of any actions that it takes.
但如今我们正经历转变:ChatGPT 不再只是 OpenAI 的答题工具,而将成为能在现实世界替你执行任务的产品。对于它采取的任何行动,你都应保持掌控。
And over time, as the models get better and you begin to trust them, more than sure, you can give it, you know, sort of more leash and trust it to take more actions autonomously. But you should control that every step of the way.
随着模型不断提升,你对它的信任加深,自然可以逐步“放长绳”,让它更自主地行动。但在整个过程中,你都应该掌控每一步。
And I think that's one of the most important ways that we're looking to build trust.
我认为这就是我们建立信任的最重要方式之一。
Speaker 2:
I mean, there's a lot that will change as you move to more and more of these agent-based workflows and the models that be able to use tools that I guess we have to co-evolve. I'm definitely, we should look into those questions.
我的意思是,随着你越来越多地采用基于代理的工作流,并且模型能够使用各种工具,很多事情都会改变。我想我们必须与之共同演化。我确定这些问题值得深入研究。
I'm just curious about your own experience. A lot of us have these oh shit moments when we use LLMs, right? There's something sublime that happens or there's two years of your work that gets returned in five seconds.
我只是好奇你的亲身体验。我们很多人在使用大型语言模型时都会有那种“天哪”时刻,对吧?要么发生了让人惊叹的事情,要么是你两年的工作在五秒钟内就被返还。
What was your most recent oh shit moment using your own products?
你最近一次在使用自家产品时的“哇靠”瞬间是什么时候?
Kevin Weil:
Well, so I'll give you, this one's a little bit more, I don't know, pragmatic, but it was a really meaningful thing for me. We have, we were talking about kids as people were sort of coming on and our,
嗯,我来分享一个更务实的例子,但对我意义重大。当时我们在讨论孩子的事,而我们的——
one of our sons had a minor surgery and It was one of those things where all odds were that it wasn't gonna be a big deal, but there's a small chance that it was a really bad thing, you know?
我们的一个儿子做了个小手术,按概率来说几乎不会有问题,但仍有小小的可能会很糟糕,你知道的。
And so they do the surgery and they take the thing and they go to biopsy it, and you're waiting to hear back. And as a parent, you're nervous, even though you know logically that the odds that it's anything bad are really small.
于是他们做了手术,取出东西去做活检,而你只能等结果。作为父母,即便理性上知道出问题的概率极低,你还是会紧张。
And at some point, we got a letter in the mail with a bunch of, you know, doctories on it that looked pretty intimidating, frankly. You know, there were a bunch of words I didn't understand and characterizing what this thing was.
后来我们收到了邮寄来的信,上面满是医学术语,说实话看着相当吓人。有很多我不懂的词汇,用来描述那个东西是什么。
And it wasn't, it didn't say like, you should worry about this or you should not worry about this. It just like characterized it and then, you know, finished. And I couldn't get a hold of the doctor.
信里既没说你该担心,也没说不该担心,只是描述了一下就结束了。而我怎么也联系不上医生。
She was in surgery or something, you know, and so there I was like, oh my God, what does this mean? And so I took a picture of it. I put it in ChatGPT and I said, like, should I be worried?
她当时在做手术之类的,所以我就想:天哪,这到底意味着什么?于是我拍了张照片上传到 ChatGPT,问它:我需要担心吗?
And I said, like, can you explain this to me like I'm five? And chat did it and was like, no, this is totally fine. Everything like nothing to worry about.
我又说,能不能用五岁小孩都能懂的话给我解释?ChatGPT 就做到了,还告诉我:不用担心,一切都没问题。
And I actually ended up not being able to get a hold of the doctor for 72 hours because she was just super busy. And that 72 hours would have been a terrible 72 hours for me as a parent if I was just sitting there brooding.
结果我整整 72 小时都没联系上医生,因为她实在太忙。如果那 72 小时我只能干坐着胡思乱想,对我这个家长来说将会非常煎熬。
And ChatGPT was able to answer. And that's like, that's, you know, us with like access to great healthcare. You think about the impact of this all over the world where people don't have the same access.
而 ChatGPT 给了我答案。要知道,我们已经算是享有优质医疗资源的人了,想想世界上那些没有同样资源的人,这种技术的影响就更大。
It's really powerful, and I think that's sort of an underappreciated part of ChatGPT.
这真是非常强大,我觉得这是 ChatGPT 一个常被低估的方面。
Speaker 2:
I mean, that is a wonderful story, and I'm glad that your son is healthy and comes out of that well, but it also speaks to the power of this particular product, right? It's a really complex, complex product,
这真是个精彩的故事,我也很高兴你儿子健康无虞,但这同样说明了这款产品的力量,对吧?它真的是一个极其复杂的产品,
and there's no way I can talk to you without asking you about how that complex product finds its way in that funny dropdown in the top left. Of ChatGPT. There must be something. You guys are all so brilliant.
我不可能不问你,这么复杂的产品怎么就放在 ChatGPT 左上角那个奇怪的下拉菜单里?这里面一定有些什么,你们都这么聪明。
There must be some internal joke about, well, how should we order it? What should we call the next one? What's really going on there? And shouldn't we all just be using 03 and 40 if we're in a hurry?
你们内部肯定有个玩笑,比如我们应该怎么排序?下一个应该叫啥?到底怎么回事?如果赶时间,难道我们不都应该直接用 03 和 40 吗?
Kevin Weil:
It's a totally fair question. And you can also, you can and you should make fun of us for our naming as well. It kind of comes back. So we have this philosophy of iterative deployment.
这是个完全合理的问题。而且关于我们的命名,你们也可以——也应该——拿来调侃一番。这归根结底源于我们的“迭代式部署”理念。
Which is these models are, I think AI is going to change all of us. It's going to change the world. It's going to change society. And we believe that the best way to do that is to kind of co-evolve together, is to get these models out there and put them in people's hands, help them understand. And also, you know, they help us discover the capabilities of the models and the weaknesses and other things.
具体来说,我认为 AI 将改变我们所有人,它会改变世界、改变社会。而我们相信,最佳路径是大家共同演进,把这些模型交到用户手中,让他们亲自体验、加深理解。与此同时,用户的使用也帮助我们发现模型的能力、弱点等方面。
And so we sort of learn together and can iterate really quickly and improve. So that's part of it.
因此我们得以共同学习、迅速迭代并持续改进。这是一部分原因。
The other part of it is we're building a lot of new capabilities as we go and if we took the time We just had like one model and we just had to build everything into one model.
另一方面,我们一路上持续开发新能力;如果只维护一个模型并把所有功能都塞进去,
We'd end up moving a lot more slowly and things would be simpler for sure, but we'd end up moving a lot more slowly because sometimes it's easier to build a new model with a certain set of capabilities that's great at certain things,
尽管系统会更简单,但进度必然大幅放慢,因为有时为特定需求新建一个模型会更高效,
not as good at other things. And so together you have sort of a collection of models that can do lots of things. Each individual model has its strengths and weaknesses.
即使它在其他方面不那么突出。于是我们形成了一个模型组合,能够覆盖各种任务;每个模型都有其长处和短板。
And so basically, we've optimized for going faster and getting more capabilities in people's hands at the expense of a bit of confusion. And then over time, as we sort of gain more control over some of the new functionality,
因此,我们选择牺牲一些清晰度,换取更快迭代,把更多能力交给用户。随着时间推移,我们对新功能掌控度提升,
we understand it better, we build it back into the core model. So you have models like GPT-4 that can do a lot of things well.
当我们更深入理解这些功能后,就将其重新整合进核心模型,从而诞生了像 GPT-4 这样能够胜任多项任务的模型。
And this is what we're trying to do with our forthcoming GPT-5 is take a lot of the things that we've learned and build more of the capabilities into a single model so that it's easier for people to reason about.
在即将推出的 GPT-5 中,我们也正尝试将所学到的大量能力融合到单一模型中,让用户更易理解和使用。
It's like, what model do I use? Just use GPT-5. And, you know, in a perfect world, it knows how hard the question you ask it is. And so it knows whether it should give you an answer like this or whether it should think for a while.
届时如果有人问“我该用哪个模型?”答案就是“直接用 GPT-5”。在理想状态下,它能判断问题难度,决定是立即回答还是先深度思考。
And, you know, that's what we're shooting for.
这正是我们的目标。
Speaker 2:
So, you know, in a way, that's lifting the cognitive load that is currently on the user, because I will sit there and I'll think, do I have time? Is that complicated question? Does it need to go to the reasoning model 03?
从某种意义上说,这减轻了用户当前的认知负担。比如我会想:我有时间吗?这个问题复杂到需要交给推理模型 O3 吗?
Does it need a longish prompt for O3? Because it might get the wrong end of the stick if I give it a shortish prompt. And in a way what you're talking about is crunching that all and building it into the model itself.
我是否需要给 O3 写一段较长的提示?如果提示太短,它可能理解有误。你们的做法是把这些考量都内嵌进模型本身。
Kevin Weil:
Yeah. And look, what I don't want to do is overpromise and say, oh, in the future, like we'll only ever have one model and it's going to all be simple. Because we're also, you know, say we launched GPT-5.
是的。不过我并不想过度承诺,声称未来只会有一个模型、一切都会简单化。即便我们推出了 GPT-5,
We're then going to have a bunch of new capabilities beyond that that we're trying to build and experiment with, and we're going to want to get those out to people and deploy iteratively and so on.
我们仍会继续构建并试验超越 GPT-5 的一系列新能力,也希望通过迭代部署等方式尽早提供给用户。
So I expect you always have this sort of phenomenon of there are new models, and you've got sort of your workhorse models,
因此,这种现象将长期存在:既有不断涌现的新模型,也有作为主力的“劳模”模型,
and then you've got some of the new ones that have certain frontier capabilities that we're experimenting with and learning with together. And then over time, as those mature, they all kind of get merged back into single models.
此外,还有具备前沿能力的新模型,我们会与用户共同探索并学习。待这些能力成熟后,它们最终会被整合回单一模型中。
Speaker 2:
So it seems that a lot of the velocity is actually about getting through the loop, right, the learning loop, and you just run all these horses at the same time so you can gather enough data about what models work against what capabilities.
看来,你们速度快关键在于不断跑完“学习闭环”:让多匹“赛马”同时奔跑,以收集足够的数据,了解不同模型在不同能力上的表现。
In a way, that helps you develop and deliver on GPT-5, and I hope you'll share the date with us.
某种程度上,这有助于你们研发并交付 GPT-5。我也希望你能透露一下发布时间。
Kevin Weil:
Down to the minute.
精确到分钟。
Speaker 2:
Down to the minute, yes, so we can time ourselves. But does that also mean that, you know, how far out are model capabilities being delivered?
精确到分钟,是的,这样我们就能准确掌控时间。但这是否也意味着,模型能力的交付周期究竟有多长?
Are there things that are being worked on today that might not make their way into models until mid-2026?
是否存在今天正在研发、却要到 2026 年年中才能真正进入模型的功能?
Kevin Weil:
Yeah, it's a good question. It's one of the most interesting things about working here is you kind of have a sense of what's coming. And you know, I'm not in the research team.
是的,好问题。这里工作的迷人之处之一就是你多少能预感到接下来会发生什么。当然,我并不在研究团队。
So I'm getting this, you know, as I collaborate and work with the researchers, I would say like, You know, on the product side, we have a decent sense of what's coming in the next,
因此这些信息主要来自我与研究同事的合作。在产品端,我们对接下来——
say, three months, maybe a hazy sense over the next six months. And beyond that, it's harder to say. You know, you have a certain set of capabilities, you know, where like, You know a bit,
大概三个月内的动向还算清晰;未来六个月则比较模糊;再往后就更难预测了。手头掌握的能力只让你看到一鳞半爪,
but you see things coming through the haze a little bit and sometimes, sometimes capabilities are, you know, it's research, right? So it's not like you just, you have the formula and you just turn the crank.
透过迷雾隐约能见趋势。可毕竟是研究,不是谁有一套公式拧开就行。
We're uncovering new things and it's unpredictable. So sometimes things take longer than expected. Other times you see these capabilities that you didn't expect at all that are kind of emergent and all of a sudden something just works.
我们不断发现新事物,其走向难以预料:有时进展比预期慢;有时又会涌现出意料之外的能力,突然就成了。
Speaker 2:
Can you give an example of one that just worked that you weren't expecting?
能举一个“出其不意却突然奏效”的例子吗?
Kevin Weil:
Well, I mean, deep research is an interesting example where for a while there were a handful of researchers thinking about the, it was like, okay, we could probably make the model I'm able to do this like iterative kind of research where,
“深度研究”就是很好的例子。起初只有少数研究员设想:或许能让模型进行递迭式研究,
you know, with deep research you give the model an arbitrarily complex query to go research something that would probably take you a week and it will go off and do like a hundred searches, but not all at once.
即你给出极其复杂的查询,常规要花一周完成;模型会分批执行上百次搜索,而非一次性全部放出。
It'll do three or four or five and then it'll reason about the results that it gets back.
它先做三五次搜索,再对结果进行推理。
Try and understand how they pertain to what you asked and what gaps are still there and then it'll go off and do some more searches and maybe think again.
它会判断这些结果与问题的关联、剩余空缺,再继续搜索并反复思考。
Maybe it'll write some code for a little while as it thinks and then go do some more, you know. And so it's this sort of iterative, I mean, it's what you would do if you were, if someone made you write a very complex research report.
它甚至可能边思考边写代码,然后再检索更多资料。整个过程就是迭代式的——就像有人让你写复杂研究报告时的做法。
You go do some research.
你会先去搜集资料。
Speaker 2:
Sorry, Kevin. I don't do that anymore. I just go to deep research. I actually can't remember how to do it on my own. But yes, I understand, right? You have to,
抱歉,Kevin。我现在都懒得自己动手,直接用深度研究功能,已经记不起手动怎么做了。但我明白,你必须——
you sort of inch your way across and you figure out exploration strategies and go down dead ends and come back.
一点一点摸索,制定探索策略,走进死胡同再撤回来。
Kevin Weil:
Yeah. And so to your question, like that, that was something that some folks were like, okay, this is coming together. But it's not clear exactly when it will come together.
没错。回到你的问题,当时大家觉得这东西“快成形了”,但具体会在何时成形仍不确定。
And so there was a small team of researchers that just believed in this and were working to make this real. And for a while it wasn't good enough, it wasn't good enough.
于是一个小团队一直信奉这个方向,努力将之落地。可它曾经长期达不到可用水平。
And then there was some advances and all of a sudden you're like, okay, this is getting good enough. And somewhere in that timeframe, we put also a product and engineering team working on it with them.
后来技术突破接连出现,突然之间进步够多了。同一时期,我们把产品与工程团队也拉进来共同攻关。
And then you have the thing that I think really is the magical part of OpenAI, when you get a research team and a product and engineering team just, you know,
而这正是我认为 OpenAI 最神奇的地方:当研究团队与产品、工程团队——
in the same room, all bringing their unique skills to bear and you understand the problem you're trying to solve. And so you're bringing back use cases.
齐聚一室,各展所长,并且对要解决的问题有共同理解时,你会不断收集新的使用场景。
We're creating evals and benchmarks for how you measure whether you're successful against those use cases. The research teams are taking that and using that to improve the model itself.
我们据此制定评测与基准,用来衡量在这些场景下是否成功;研究团队再利用反馈来改进模型。
And you get this tight loop of the model improving towards a particular product. And it's, you know, I think our best products are the ones that we build that way. And deep research is a good example.
这样就形成了模型面向特定产品持续迭代的紧密循环。我认为我们最好的产品都是这样做出来的,“深度研究”就是典范。
Speaker 2:
That's quite, I mean, that's a novel way of thinking about product development. I mean, if I think about the history of product development, quite often it was Before the 90s,
这真是——我的意思是,这是对产品开发的一种新颖思考方式。回顾产品开发的历史,在 90 年代之前往往如此,
before the consumer internet was run by engineers, and they'd say, we've got a new chip that can do this and try to figure out some software that can do useful things on that chip.
在消费级互联网出现之前,产品基本由工程师主导,他们会说:我们有一款新芯片能做到这些功能,然后想办法编写能在该芯片上发挥作用的软件。
I think the big breakthrough of the consumer internet was to put product managers at the heart of product development. We talked about lean and iteration and being very data-driven and user-centric. And now you're getting to this new model,
我认为消费互联网的一大突破,是把产品经理置于产品开发的核心。我们谈论精益、迭代,以及高度数据驱动和用户导向。而你们现在正走向一种新模式,
which I would characterize as not at all a return to the pre-internet product engineering led,
我将其形容为绝非回到互联网之前那种工程师主导的模式,
but something that is quite It's quite novel because the researchers discover something that's a little bit a new capability and then you have to have a very rapid discussion about how can that capability be productized and then this word you used,
而是一种相当新颖的模式:研究人员发现了一点新能力,你们便需迅速讨论如何将该能力产品化,然后就到了你提到的那个词——
eval, which I guess it means how can it be measured to see whether it's actually doing its trick. So is it really a new discipline that is evolving at this point?
“eval”,我猜它指的是衡量这种能力是否真正奏效。所以说,这是否正在演变成一门全新的学科?
Kevin Weil:
I think it's a completely different way of building products. It's certainly different than anything I've ever done in my career. And you know, within research, there's kind of a, there's a spectrum, right?
我认为这是一种截然不同的产品构建方式,肯定与我职业生涯中做过的任何事都不同。在研究领域内部,其实存在一个光谱,对吧?
There are parts of our research team that are just like deep research. It's almost academic in nature because they're trying to just like, they're looking for new breakthroughs.
研究团队中有些部分专注于“深度研究”,几乎带有学术性质,他们在寻找全新的突破。
They're trying to find things that nobody has ever, you know, figure out things that nobody's ever figured out before. And those kinds of things,
他们努力发现无人触及、前所未有的新事物。对于此类探索,
you don't want to be product-driven at all because you want to give a lot of room for exploration and fundamental breakthroughs. And then there's kind of the other end of research.
完全不应该由产品牵引,因为需要给予充分空间去探索和实现根本性突破。而在研究的另一端,
It's more on the post-training side where you really are trying to teach the models to do specific things very well. And those teams tend to be much more like, you know, partnered with product and engineering teams with a common goal.
则偏向于后训练阶段——目标是让模型在特定任务上做到极致。这些团队通常会与产品和工程团队紧密合作,拥有共同目标。
And then it's kind of a spectrum in between. I think the right way for us to be is not, we certainly don't want to be entirely product-led. That's not the magic of this place.
中间还有各种层次。我认为正确做法并非完全由产品主导,这并不是这里的魔力所在。
It's not maybe entirely research-led either because it's good to know feedback about what problems you can solve for people and how we can make the biggest impact in the world.
也不必完全由研究牵头,因为了解用户痛点、明白怎样产生最大影响同样重要。
It's really sort of a combination of both with research really at the core though. I've loved it. It's the most fun in the world and moves super fast.
归根结底,是两者结合,但研究始终是核心。我很享受这种方式,它既充满乐趣又发展迅猛。
Computers can do things that they couldn't do two months ago and we're constantly in that state.
计算机在两个月前还做不到的事,如今却已办得到,而我们正持续处于这种状态中。
Speaker 2:
But when you're working with research, then it's more than just looking at a scaling law and saying, oh, we've just, Sarah Fryer has just signed off another 100,000 GPUs.
但当你与研究团队合作时,事情远不止于查看扩展规律然后说:“哦,Sarah Fryer 又批准了十万块 GPU。”
Therefore, it will be able to do this in six months when the training run is done. There's more to it than that. But one of the things I find fascinating is how do you map Those capabilities against products,
接着得出结论:六个月后训练完成就能实现某功能。远不只是这样。我觉得迷人的是,如何将这些能力映射到具体产品上,
and you talked about evals, which I guess are evaluations. What is the structure of an eval, and is it what replaces what was in an old product requirements document that we might have had 15 years ago?
而你提到的 eval——我猜是评估——又是什么结构?它是否替代了我们十五年前那种传统的产品需求文档?
Kevin Weil:
Yeah, sort of. I think in some ways for understanding where the model is good and where it's not. If you think of the model as sort of an intelligence of some sort, Intelligence is so multifaceted.
算是吧。在评估模型优劣方面,eval 的确发挥作用。如果把模型视作某种智能,智能本身就极其多维。
People are smart in a million different ways and one smart person is better than another in certain areas and worse than another.
人类的聪明体现在千万种方式上:一个人在某些领域优于他人,在另一些领域则相对逊色。
So one way to think of evals is a way to measure capabilities and intelligence of models on different dimensions. So you can have evals around how good it is at solving USAMO Math Olympiad-style problems,
因此,eval 可以被视为衡量模型在不同维度上能力与智能的手段。比如,你可以用 eval 来测试模型解决 USAMO 数学奥赛题目的水平,
and another around how good it is at chemistry, and another one about how good it is at creative writing.
也可以评估它的化学能力,或是它在创意写作方面的表现。
Speaker 2:
But are you using the public benchmarks for that, the sort of RKGI and AIME and GPQA as your.
那么你们在这方面是不是采用了那些公开基准,比如 RKGI、AIME 和 GPQA 来做评测?
Kevin Weil:
As your way of measuring some and then also for when we're building specific products, I think one of the most effective ways to build products is to take the skill that you want the model to have in order to meet the product need.
在部分场景我们会用这些基准;当我们打造具体产品时,一种非常有效的做法是先确定模型为满足产品需求所必须具备的能力,
Turn that into an eval so you can actually understand whether you know how good you are at it and also how you're getting better over time. But one of the fascinating things is like the evals that we all used, you know, a year ago to measure models. They're all very kind of cut and dried, like you're testing against math. And with math, there's a right answer.
然后把这项能力转化成 eval(评测),这样才能知道自己目前的水平,以及随时间推移是否在提升。有趣的是,过去我们用来衡量模型的那些 eval——比如一年前的评测——都相当“干脆利落”,像考数学题一样;数学总有唯一正确答案。
You can talk about creative writing evals though, and with creative writing, there's no answer. So how do you grade that, right? That's one problem.
但如果谈到“创意写作”类 eval,写作并没有标准答案,那你要怎么打分?这就是第一个难题。
The other is like, as you start to take on more complex tasks, you're not just answering questions. You're actually trying to, you're trying to like automate some multi-step workflow. There may be ambiguity in the right way to do that.
另一个难题是:当任务变得更复杂时,你不只是回答问题,而是想自动化一整套多步骤流程;“正确做法”本身就可能存在歧义。
If I'm, you know, if I'm an AI booking a flight for you, There's not a single way to grade which correct flight, you know.
举例来说,如果我是一个替你订机票的 AI,“哪一张机票才算正确选择”并没有唯一评分标准。
You also get into these really interesting, challenging, subjective ways of how do we actually grade this particular task. And part of having an eval, if you want to at least automate it,
于是就会陷入这些有趣而又具挑战、带主观性的评分方式。如果你想把 eval 最起码部分自动化,
is you need to also have a grader for it so that you can very quickly understand how you're doing on that eval.
就必须同时设计一个“判卷器”,才能快速了解自己在该 eval 上的表现。
So it is interesting as one of the skills that I think is going to be more and more important for PMs over time is the ability to actually create evals for the products that you're building.
因此,我认为对产品经理而言,“为所构建的产品设计 eval”将成为日益重要的核心技能。
Speaker 2:
Yes, I mean, that's one. And the other one is the prompt at the front end. Because what we are starting to see with these leaks, and I don't know how real they are or not, but there are a number of X accounts that say,
没错,这是一个方面。另一个方面是前端的 prompt。最近我们看到一些泄漏内容(真伪待定),不少 X 账号声称,
I've just had the leaked system prompt over then insert your favorite foundation model or coding tool. And the system prompts, which are the sort of structured instructions that go out with every query,
他们拿到某些基础模型或代码工具的系统 prompt。这些系统 prompt 是随每条查询一起发出的结构化指令,
A lot of people are really, really quite complex and the product in of themselves now, right? They run to thousands of words. They're highly structured. There's clearly strategies being applied as they get put together.
本身就异常复杂、几乎成了一种产品——动辄上千字,结构严谨,显然在编排时用了很多策略。
So how important is that skill and capability when you're shipping products to people like me?
那么,当你们把产品交付给像我这样的用户时,具备这种 prompt 设计能力究竟有多重要?
Kevin Weil:
I mean, actually, more than people realize, I think, I would love to make it over time, less of a thing, and I think over time it is. If you go back a year or two,
实际上,这一点比大多数人想象的更重要,但我希望它的必要性随着时间推移逐渐降低,而事实也确实在朝这方向发展。回顾一两年前,
everybody was talking about prompt engineering and it was gonna be the skill that everybody had to master in order to do anything with AI. You don't hear it talked about quite as much like that, and I think that's a good thing.
大家都在谈“提示词工程”,仿佛要使用 AI 就必须掌握这门手艺。如今这种说法已经不那么普遍了,这是一件好事。
Ideally, it matters less and less that for any particular user, if they have a question, they want an AI to do something for them, You shouldn't need to get into like arcana around did I use the exact right word and did I give my exact,
理想状态下,用户提出需求时不必纠结于“我用的词是否精准”“指令是否完美”之类的细节,
you know, it should just work. I think that's part of increasing intelligence is the model can understand what you're looking to do and do a good job of it without you having to work super hard at it. That said,
系统就该直接奏效。我认为模型智能提升的一环,就是能理解你的意图并高质量完成任务,而无需你费神。但话说回来,
prompts still do matter and the models are very controllable with prompts and so we still find we'll launch something and we'll find that it's not behaving in certain ways the way we want it to and we can adjust it with a prompt a lot of times.
提示词依旧重要——模型对 prompt 非常敏感可控。我们发布新功能后,若发现某些行为不如预期,往往能通过调整 prompt 来修正,
You don't need to go back and retrain the model. So it's both that I want to make it less necessary over time and that it is still a powerful vector.
而无需重新训练模型。所以,我一方面希望未来对 prompt 的依赖度降低,另一方面也承认它依旧是强大的调控手段。
Speaker 2:
Well, these are two of the vectors of the direction of the product, but the third one is the idea of agents and what agents will bring to us. I think probably the first agent product that OpenAI launched that I used was Deep Research.
嗯,这些是产品方向上的两个发展向量,但第三个则是“代理”这一概念以及代理将为我们带来的价值。我想我使用过的、OpenAI 推出的首款代理产品大概就是 Deep Research。
And the word agent is being thrown about quite a lot. I mean,
而且如今“代理”这个词被频繁提及。我的意思是,
I'll use the word agent and what I mean is I've strung a bunch of prompts together through your API and there's a bit of logic to move a document through a series of steps to the other end. What does an agent mean to you?
当我使用“代理”一词时,我指的是通过你们的 API 把一连串提示词串联起来,并加入一些逻辑,让文档按步骤流转至终点。那么对你来说,“代理”意味着什么?
Kevin Weil:
We think of an agent as something that can do independent work. So it's not just a quick, you know, you ask a question, you get an answer, but it's actually off doing tasks for you in the real world.
我们认为代理是一种能够独立完成工作的系统。它不只是简单的问答,而是能在现实世界中替你执行任务。
So another, I think deep research is a great one where it's off doing, you know, hundreds of searches and putting together a complex report for you that might have taken you a week.
另一个例子就是 Deep Research——它会自主执行数百次搜索,为你组装一份原本可能需要一周才能完成的复杂报告。
I think another is Codex, which is our software engineering agent that we just launched. What you can do is, if you have a code base that you're operating against, you're building a new feature in a code base or debugging something,
我们刚发布的软件工程代理 Codex 也是如此。假如你手上有一个代码库,需要开发新功能或排查缺陷,
You can just give this agent the prompt like, hey, I need you to fix this thing. I want you to do this to the background of my webpage. I want you to, you know, build this new feature.
你只需给它下一条指令,例如:帮我修复这个问题;把网页背景改成这样;实现这个新功能。
And it will go and look through your entire code base, understand all of the context. If you're, you know, fixing a bug, it'll go try and figure out where that bug exists.
随后它会遍历整个代码库,理解所有上下文;若是修 bug,它会尝试找出缺陷所在。
And then it will write new code for you and create a pull request like a diff. You know, here's the set of changes we need to make to the code. And then you can go review the code. And the agent did the whole thing.
接着它会为你编写新代码并生成一个差异拉取请求,列出需要修改的内容;然后你审查代码——整个流程都是代理完成的。
And so, you know, I used to be an engineer. I still write code a little bit in my spare time, but I haven't written a single, you know, line of code for OpenAI. But with Codex, I was, this was like, you know, a few days before it launched.
我以前是工程师,现在业余还会写点代码,但在 OpenAI 我从未写过一行代码。不过在 Codex 上线前几天,
I was, you know, I was like 11 at night or something. I was doing a bunch of work that I had to do before I could go to bed. And I was like, you know what, I bet I could fix a bug right now.
那天大概晚上十一点,我在完成睡前必须处理的工作时想:或许我现在就能修个 bug。
And so went and found a bug that looked relatively simple and just, you know, pasted the context into Codex, said, can you go off and fix this bug? By the way, it was in a language that I had actually never worked with in my life.
于是我找了个看起来简单的 bug,把相关上下文粘进 Codex,让它修复。顺带一提,那还是我从未接触过的编程语言。
So it would have taken me even more time if I had to do it myself. And 10 minutes later, I had a pull request. It looked reasonable. I submitted it. An actual legitimate engineer looked at it and said, yeah, this looks right.
要是我亲自处理会花更多时间。但十分钟后我就拿到一个看起来合理的拉取请求。提交后,真正的工程师审核后说:没问题。
And, you know, now there's a few lines of code shipping today that, you know, came from me using Codex. It speaks to the power of this thing when you can just have this software agent off, like actually solving real world tasks for you.
如今线上运行的几行代码正是我使用 Codex 的产物。这说明当你能让软件代理真正替你解决现实任务时,它的威力有多大。
And in the meantime, I was like writing email and following up on Slack and, you know, doing all the things that I do in my day job. So it was just purely additive, which I think is really cool.
与此同时,我在写邮件、回复 Slack,处理日常事务;这一切都是额外收益,我觉得这非常酷。
Speaker 2:
Yes, because the Codex ProSense takes a little bit of time. It has a lot of material it has to read and understand and then make the changes.
是的,因为 Codex 处理过程需要一点时间;它得阅读、理解大量内容,然后再进行修改。
And I'm curious about, this is a question that everyone who's built a product that does code automation or developer augmentation is asked is,
我想知道——这是所有开发代码自动化或开发者增强产品的人都会被问到的问题——
so today what portion of the OpenAI code base is in the first instance produced by CodeX rather than by a human engineer?
截至今天,OpenAI 代码库中首次提交的代码里,有多少比例是由 Codex 生成而非人工工程师编写?
Kevin Weil:
Yeah, it's pretty meaningful and it's increasing quickly.
比例相当可观,并且正在快速提升。
Speaker 2:
Right, okay. Somewhere in the meaningful, I'll go up and ask O3 what meaningful means in percentage terms and I'll get a good distribution.
好的,明白。在合适的时候,我会去问一下 O3,“有意义”在百分比层面到底意味着什么,然后我就能得到一个不错的分布。
Kevin Weil:
The cool thing is you can fire off 10 of these tasks at once, right? So we try and actually give you the value of all this parallelism where, you know, it's not just you can do one thing, but if you have a Codex agent working for you,
最棒的是,你可以一次性启动 10 个这样的任务,对吧?因此我们努力真正让你体会到并行处理的全部价值——你并不仅仅只能做一件事;如果你有一个 Codex 代理为你工作,
why not have 10 Codex agents working for you on 10 different tasks? And by the way, just to connect it to the previous topic on evals, this is also, evals are, there's a really important kind of subtlety to them too,
那为什么不让 10 个 Codex 代理同时为你执行 10 个不同的任务呢?顺便说一下,为了接上之前关于评估的话题,评估本身也有一个非常重要且微妙的层面,
where they have to be tailored to the product that you're trying to build and the problem that you're trying to solve, where, you know, coding isn't one thing. Just, coding is a small vertical of the entire world, but even within coding,
评估必须针对你要构建的产品和要解决的问题量身定制;编码并不是一件统一的事情。编码只是整个世界的一个小垂直领域,但即便在编码内部,
you can be good at lots of different kinds of coding. And with Codex, that was a great example of going and saying, OK, what kinds of coding really matter to us? What kinds of tasks and all the tasks that a developer does,
你也可以擅长很多不同类型的编码。Codex 就是一个很好的例子,我们会去问:哪些编码类型对我们真正重要?开发者的所有任务中,
what kinds of tasks do we really want to be good at? And we created evals for those. And then we we made sure to monitor as we train the model, is it getting better and better and better at these?
我们真正想擅长哪些任务?于是我们为这些任务创建了评估,并确保在训练模型时持续监控:它在这些任务上的表现是否越来越好?
And, you know, you go and accumulate tasks and examples for the model to learn from. But you do it against a specific set of evals that correspond to a specific set of problems you want to solve.
接着,你会不断积累任务和样例供模型学习,但这一切都是基于一套与你要解决的问题一一对应的特定评估来进行的。
Speaker 2:
It's very capability driven in that respect, right? And then it speaks to, you know, how do you actually do enough testing both to make sure that you are, you know, getting to the level, the right level of score that you want,
从这个角度来看,它非常以能力为导向,对吧?这也涉及到:你如何进行充分测试,以确保你确实达到了自己想要的评分水平,
but also making sure that it doesn't go off the rails, right? And I think that as these agents get more and more Complex, given more complex tasks, that's something to bear in mind. I mean, in one of my workflows, which was a very,
同时又要确保它不会“脱轨”,对吧?我认为,随着这些代理变得越来越复杂、承担越来越复杂的任务,这一点尤其值得注意。举个例子,在我的某个工作流程中,
very simple one where I wanted an agent to go through and grab some data from a series of web pages and populate an Excel spreadsheet. And I was using some third-party agent framework.
那只是一个非常简单的场景:我想让代理遍历一系列网页抓取数据并填到 Excel 表格里。我用的是第三方代理框架。
And it was so diligent, Kevin, that it said, I must check my work. Which it did about 400 times and left me with a \$75 bill. And it had got it correct the first time, right? It got it right and got stuck in this strange loop.
结果它过于勤奋了,Kevin,它说“我必须检查我的工作”,于是检查了大约 400 次,给我留下了 75 美元的账单。而且它第一次就做对了,对吧?它做对后却陷入了奇怪的循环。
So that's one of the things I think that I hear when I get around people saying, well, how are we going to be able to control these things? Not from a humanity out of control method. Measure, but from an enterprise reliability.
所以这就是当人们讨论“我们如何控制这些东西”时,我经常听到的担忧——不是从人类失控的角度,而是从企业可靠性的角度。
How can I make sure that this isn't like the sorceress apprentice and the thing runs out of control when I simply asked it to book one flight to Italy and it's booked me 200. And how do you test for all of that?
我怎么确保这不会像《魔法师的学徒》那样失控——我只是让它订一张飞往意大利的机票,结果它给我订了 200 张。你又如何对这些情况进行测试呢?
Kevin Weil:
Yeah, I think part of this is about making sure that, like we talked about earlier, that the user is in control here. So you should be able to at some point be like, hey, you know what, you've checked enough, like, you're good.
是的,我认为其中一部分就是要确保,就像我们之前说的,用户始终掌握控制权。所以你应该可以在某个时刻对它说:“好了,你已经检查得够多了,没问题。”
And the other interesting thing in all of this is the technology is evolving so quickly, like much more quickly than I think we're used to with technology. We're used to things taking like decades to deploy and to really achieve scale.
还有一个有趣的现象是,这项技术的发展速度极快,比我们过去习惯的技术迭代速度快得多。过去某些技术要花几十年才能部署并真正实现规模化,
One of the phenomenons you see with AI technology is there'll be some benchmark, some eval, that AI just can't crack. And people are like, oh, AI just can't do that. And then one day, somebody ships a model that gets like 5% on that eval, still mostly can't do the job, but just like begins to get it. And then what you inevitably find is like two months later, there's a model that's at 30 on that eval. And then four months later, there's a model that's at 60. And then, you know, within six months, it's completely saturated. And like models are great at that new skill and will forever be.
而在 AI 领域你会看到这样一种现象:某个基准、某个评估,AI 完全过不了,人们就说“AI 做不到”。然后有一天,有人发布了一个模型,这个评估它得了 5% 的分数,它仍然干不好活,但开始有点能力了。然后你会发现,不可避免地,两个月后就有模型在同一评估上拿到 30 分,再过四个月,就有模型拿到 60 分。再过六个月,这项能力就完全饱和了;模型在这一新技能上表现出色,并将永久保持下去。
And so you go very quickly from like proof of existence to like, oh yeah, of course AI models can do that. That like rate of development is still, I think, something that we're not totally used to.
于是你会从“这件事刚刚被证明可行”迅速转变为“哦,当然 AI 模型能做到”。这种发展速度,我认为我们仍然不太习惯。
Speaker 2:
It's that first one or two percent, right, that becomes hard, that proves it can be done. It's the Kitty Hawk flyer. And then within 30 years, we're moving large numbers of passengers across the Atlantic.
正是最初那一两个百分点最困难,它证明事情可以做到。这就像基蒂霍克号飞机。而在 30 年内,我们就能让大量乘客横跨大西洋。
But in this case, it's within 30 days. So I want to ask about coding and coding agents. If you look at the growth of Gen AI applications and similar web ads and data come out a few weeks ago, the baseline was that the generalized chatbots are growing at 25% per quarter. That's the chat GPTs and so on. Virtually every other product category, image generation, video generation, sound generation, is growing slower than that or declining in size, and I view that as the black hole that is the capability of your core models. The one category where growth was faster, 75% a quarter according to SimilarWeb, was in coding.
但在这个案例里,只需要 30 天。所以我想问一下关于编码和编码代理的问题。如果你看看生成式 AI 应用的增长,以及几周前 SimilarWeb 发布的广告和数据,基线显示,通用聊天机器人每个季度增长 25%。那就是 ChatGPT 之类的。几乎所有其他产品类别——图像生成、视频生成、声音生成——增长都比那慢,或者规模在缩小,我把那视为你们核心模型能力的黑洞。唯一增长更快的类别是编码,根据 SimilarWeb 的数据,季度增速达 75%。
I was curious, have you selected coding From a kind of commercial perspective, because you can really see the demand and developers always wanting to experiment,
我很好奇,你们选择编码,是从商业角度出发吗?因为你们确实能看到需求,开发者总想尝试新东西,
or do you select coding because being a testable, structured, verifiable set of outputs, it's a slightly easier challenge than the sort of fuzzy, amorphous tasks that occupy the rest of the world?
还是因为编码作为一个可测试、结构化、可验证的输出集合,相比于世界上那些模糊无形的任务,挑战稍微容易一些?
Kevin Weil:
Yeah, it's a really good question. And actually coding is this vertical that kind of hits all of these things. You know, for one, it's really important to us because if we can speed up coding,
是的,这是个很好的问题。实际上,编码这个垂直领域几乎涵盖了所有这些因素。首先,对我们来说它非常重要,因为如果我们能加快编码速度,
if we can make every engineer more effective, we also make ourselves more effective. And so we can build even faster and we can bring AGI to the world faster. So it's interesting to us from that perspective.
如果我们能让每位工程师都更高效,我们自己也会更高效。这样我们就能构建得更快,也能更快地把 AGI 带给世界。从这个角度看,它对我们很有吸引力。
It's a clear kind of milestone or step on the way to AGI itself because it's a very sort of general purpose reasoning. It's also a relatively gradable task. You can tell, like in math or other things, if you get the answer right.
在通往 AGI 的道路上,编码也是一个明确的里程碑或步骤,因为它是一种非常通用的推理任务。它也是一个相对可评分的任务,就像数学或其他领域一样,你能知道答案对不对。
It's also something that our engineers are familiar with, so it's a problem space that they understand and have good intuition for. It's also a huge market, as you were saying. It's also a market full of early adopters.
这也是我们的工程师熟悉的事情,因此他们了解且具有良好直觉。正如你所说,这也是一个巨大的市场,同时充满早期采用者。
You know, technologists leaning into this. It's also, you know, relatively sort of open and unregulated. It's not like trying to go into health or something where, you know, there are all kinds of other things you have to do.
科技工作者都在积极投入。而且它相对开放且不受严格监管,不像进入医疗等领域,需要处理各种其他手续。
And so it's this aggregation of all of these interesting things that make coding a really interesting market. And I haven't seen that data, but I totally believe it.
正是这些有趣因素的聚合让编码成为一个非常有吸引力的市场。那份数据我没看到,但我完全相信。
Speaker 2:
And would you say that within coding, are you already seeing signs of serving people like you who are not technically engineers anymore? In other words, we're seeing that expansion of the market through these tools.
在编码领域,你们是否已经看到为像你这样不再是技术工程师的人群提供服务的迹象?换句话说,我们是否正通过这些工具看到市场的扩张?
Kevin Weil:
Oh, yeah. And I think there's going to be so much value in democratizing coding out to the world. There's like, what, 30 million developers or something worldwide, depending on how you define it. Which is great. That's a lot of people.
哦,是的。我认为将编码民主化到全世界会带来巨大的价值。全球大约有 3000 万开发者,具体取决于定义。这已经很多了。
But imagine if a billion people can write code. I was talking to somebody the other day who was just telling me they were, during COVID, they were working for their local county trying to get, you know, vaccinations and stuff out to people.
但想象一下,如果有 10 亿人能写代码。我前几天和一个人聊天,他告诉我在新冠期间,他在当地县政府工作,试图把疫苗等信息提供给大众。
And they were trying to put together a website to track so people could like sign up and just do basic stuff. And the whole world was busy and they couldn't do it. They couldn't create a website. They didn't have the skills to do it.
他们想搭建一个网站,方便人们注册并执行一些基础操作。但当时全世界都很忙,他们做不到。他们无法创建网站,因为没有相关技能。
And as a result, they were managing things less efficiently, doing a bunch of manual work at a time when everybody was slammed. And he was just saying, can you imagine if I had these tools,
结果,他们管理事情的效率更低,在大家都手忙脚乱的时候还要做一堆手工工作。他就说,如果我当时有这些工具,你能想象吗,
we would have been able to create a website overnight. It would have just worked. And they would have been able to do their work more effectively. And you have that times A million, as you look around the world.
我们可以一夜之间就搭建好网站,它会直接运行。他们也能更有效地开展工作。放眼全球,这样的场景有成百万倍。
And so, I mean, that is actually the other thing about coding that I think is super interesting. It's maybe like the ninth reason why coding is a good, it's such a general purpose technology.
所以,这其实是我认为编码超级有趣的另一个原因。或许这是编码好的第九个理由——它是一种通用技术。
If you can create code, then you can create all kinds of things. And so, there's something really powerful to the idea that a billion people might be able to write code.
如果你能写代码,你就能创造各种东西。因此,想象 10 亿人能够写代码,这个想法本身就非常强大。
Speaker 2:
But it also speaks, though, I think, to how this may fundamentally change the software industry in the way that the internet changes the software industry,
但我认为,这也说明了这一点可能会以互联网改变软件行业的方式,从根本上改变软件行业,
not just because of packaging and distribution, but the way we interacted with social technologies. My Microsoft Word, when it was on floppy disk, never allowed me to sort of exchange notes with somebody else on like Google Docs.
不仅仅是因为包装和分发方式的变化,还因为我们与社交技术互动的方式。当我的 Microsoft Word 还在软盘上时,它从未让我像在 Google Docs 那样与他人交换笔记。
One of the big questions that's out there is, you know, as a platform company that is building the most performant models out there, How much space do you leave for startups?
一个备受关注的问题是,作为一家构建最强大模型的平台公司,你们会给初创企业留下多少空间?
You know, I remember when Microsoft introduced disk compression in Windows 95 or 97 or something, and there were a whole load of third-party companies that offered disk compression that essentially went out of business there and then.
我记得微软在 Windows 95 或 97 推出磁盘压缩功能时,当时有一大批提供磁盘压缩的第三方公司就此倒闭。
And this is something that happens on X every time you release a new foundation model. It's like every time Kevin tweets, another 50 startups die. Where is their space, right?
而每当你们发布新的基础模型,在 X(指 Twitter)上都会发生类似的事。就好像每次 Kevin 发推,又有 50 家初创公司倒闭。他们的空间在哪里,对吧?
Where is their space in the software world, in the AI software world? Which is safe from the increasing capabilities of the foundation models that you and other companies are building.
在软件世界、在 AI 软件世界里,他们的空间到底在哪?有什么地方能不被你们和其他公司日益强大的基础模型所侵蚀?
Kevin Weil:
So, Steven Stanofsky told me an interesting story about this one time. Steven used to run Windows at Microsoft and Office and you know everything.
有一次,Steven Stanofsky 给我讲了个有趣的故事。Steven 曾在微软负责 Windows 和 Office,几乎一手包办一切。
And he was telling me this story about like the transition from Windows 93 to Windows or whatever it was called back then to Windows 95, Windows 3.1 maybe. We're just like the beginning of the internet.
他讲的是从当时叫作 Windows 93 的版本迁移到 Windows 95、或者可能还有 Windows 3.1 的故事。那时互联网才刚刚起步。
And so most people weren't using the internet. And if you were going to actually get on the internet with Windows 3.1, you had to like go to some, you know, University of Oregon professor's website and download a TCP IP stack, compile it yourself and like, you know, install some device drivers and then you could actually go on the internet.
那时大多数人还不上网。如果你想在 Windows 3.1 上真正上网,你得去某个俄勒冈大学教授的网站下载一个 TCP/IP 协议栈,自行编译,并安装一些设备驱动,才能真正连接互联网。
And then in Windows 95, of course, the internet was happening and they were like, okay, we need to ship this stuff with Windows. And so they did. And it was like you were saying, there was this, you know,
到了 Windows 95,互联网显然已经流行起来,他们就说:“好吧,我们得把这些东西直接随 Windows 一起发布。”于是他们就这么做了。正如你所说,当时就出现了这种情况,
there were a bunch of people who were like, hey, Now that you've just put that university of whatever professor, he did all this work and now you just shipped it, come on.
有些人抱怨说:“嘿,你们把那位某某大学教授辛苦做的东西直接打包发布了,这也太过分了吧。”
And Stephen's point was that you would never want to live in a world where today you still had to go to some professor's website and download a TCP IP stack and compile it yourself to get it going. You just want to use the internet.
而 Stephen 的观点是:你绝不会想生活在一个今天仍需去某位教授的网站下载 TCP/IP 协议栈并自行编译才能上网的世界。你只想直接使用互联网。
And basically the expectations of a platform, the consumer expectations of a platform, are an increasing function of time. And if the platform can provide more of the technology that, you know,
本质上,平台的期望值——消费者对平台的期望——随着时间推移不断提高。如果平台能够提供更多技术,
if you see something where in order to build the actual thing that people want to build, you've got 10 different companies having to go build the exact same piece of like foundational infrastructure, you should probably just provide that.
如果你发现,为了构建人们真正想要的应用,有 10 家不同公司不得不去造完全相同的一块基础设施,那你大概就应该把那块基础设施直接提供出来。
And then those 10 companies can go do like more interesting stuff. And I've always remembered that story.
这样那 10 家公司就能去做更有趣的事情。我一直记得这个故事。
It's really stuck with me because I think the fact that people are just going to expect more and more out of these platforms is very real. But the upside is all for third parties in this, for developers in this world, because if the platform provides more of the building blocks, then they can spend less time re-implementing the wheel on these building blocks and more time doing the thing that they actually uniquely add value in.
这个故事让我印象深刻,因为人们对这些平台期望值越来越高是确凿无疑的。但这其中的好处全部属于第三方、属于这个世界里的开发者,因为如果平台提供了更多基础构件,他们就能少花时间在这些构件上重复造轮子,把更多时间投入到真正能体现独特价值的事情上。
And AI is going to change absolutely everything in our life. Any industry, any vertical, any geography that you can imagine, AI is going to touch. And so there's so much opportunity for developers to reinvent and reimagine.
而且 AI 将彻底改变我们生活中的一切。任何行业、任何垂直领域、任何地理区域,你能想象到的,AI 都会触及。因此,开发者拥有无数机会去重新打造与再想象。
I think anything that we can do on the platform side to help accelerate that by making more of the building blocks easy, we should be doing.
我认为,只要能够通过简化更多基础构件来进一步加速这一进程,平台方能做的事情都应该去做。
Speaker 2:
So let's imagine my son, 20 years old, and he wants to build a product on top of OpenAI. Where is a good place for him to go and build it?
假设我的儿子现在 20 岁,他想在 OpenAI 之上开发一款产品,他去哪里开发比较合适?
Kevin Weil:
I mean, almost everywhere. There's so much opportunity. Sam said this one time and it stuck with me. He said, if you're building a company and you're building at the frontier of the model capabilities,
我的意思是,几乎到处都可以。机会太多了。Sam 曾经说过一句话让我印象深刻:如果你创办公司时正站在模型能力的最前沿,
if you're building something that really just barely works and you can't wait for our next model because you know it's gonna make your product sing, then you're probably building in the right place.
如果你构建的东西只是勉强能用,而你迫不及待想要我们的下一代模型,因为你知道那会让你的产品熠熠生辉,那你大概率走在正确的方向。
Because you're introducing something new to the world. You're like making something possible that wasn't possible before. And that's where you want to be.
因为你正在向世界推出新的事物,让原本不可能的事情成为可能。这正是你想待的位置。
If instead, you're building like some sort of scaffolding around that covers up the weaknesses of a current model, and you're actually afraid of our next model because it might not have those same weaknesses,
相反,如果你在搭建某种脚手架,专门用来掩盖当前模型的弱点,而且你事实上害怕我们的下一代模型,因为它可能不再有这些弱点,
that's a bad place to be building. Because on average, models are going to improve really fast and what's a weakness from one model will not be a weakness of the next.
那就是一个糟糕的建设位置。因为通常而言,模型会非常快地改进,一代模型的弱点在下一代模型里就不再是弱点。
So I think the thing to be building is like what we talked about at the beginning, reimagining use cases from first principles, building them from the ground up with AI.
所以,我认为应该做的事情是我们一开始讨论的——从第一性原理重新构想用例,用 AI 从零开始构建它们。
And if you're in a place where you're excited about the next model that comes out because that's what's going to make your product sing, that's a great place to be.
如果你因为下一代模型的发布会让你的产品大放异彩而感到兴奋,那就说明你选对了地方。
Speaker 2:
That's a fantastic heuristic, which is actually if you're a founder out there,
这是一个绝妙的启发式规则,如果你是创业者,
Think about something that you want to build that the models are not yet capable of but will be capable in a little bit of time and you can build on top of that capability.
思考一些你想要构建、但现有模型尚不能胜任、却将在不久后能够胜任的东西,然后基于那种能力去构建。
We can't talk about products without talking about your new product buddy, Jony Ive. So tell us a little bit about what the mood in the office was when that lovely black and white photo was released.
谈产品就绕不开你的新伙伴 Jony Ive。告诉我们,当那张漂亮的黑白照片发布时,办公室里的气氛如何?
Kevin Weil:
Oh, people are incredibly excited. I mean, how could you not be? I use products that Jony designed my entire day. He's been a part of building some of the most sort of cherished products and hardware that we use every day.
哦,大家都无比激动。怎么可能不激动呢?我一天到晚都在用 Jony 设计的产品。他参与打造了我们每天使用、最受珍爱的诸多产品和硬件。
How could you not want to work with him? And, you know, getting to know him through the process and other things, he's also such a lovely human being.
你怎么可能不想和他共事?而且在这个过程中了解他等其他事情后发现,他还是一个非常可爱的人。
For somebody who's accomplished so much, he is so humble, thoughtful, kind, soft-spoken, And then, you know, he'll say something sometime and you'll be like, oh my God, like that is a completely different way of looking at the world.
对于一个成就如此卓越的人来说,他却如此谦逊、周到、友善、语调温和。有时他说的一句话会让你惊呼“天哪”,那完全是一种截然不同的世界观。
And that opens my eyes to something that I just had never thought about. And so is this combination of like genius and also wonderful human being. How could you not be excited to work with him?
那会让我看到自己从未想到的事物。因此,他兼具天才与温暖人格——你怎么可能不期待与他共事?
Speaker 2:
And of course, he's a Brit. How is he going to work, his group and your group going to work? How are you guys going to interface?
当然,他来自英国。他的团队和你们团队将如何协作?你们会怎样对接?
Kevin Weil:
Well, I mean, he's coming in focusing on these sort of consumer hardware products. And then also over time, I think, will play a very significant role in design as a whole at OpenAI. And again, I'm very excited about that.
嗯,他的工作重点是这些面向消费者的硬件产品。同时,我认为随着时间推移,他将在 OpenAI 的整体设计领域发挥非常重要的作用。我对此再次感到非常兴奋。
How could you not be? Jony Ive coming in to take charge of a lot of your design, you know?
你怎么可能不兴奋?Jony Ive 要来负责你们的大量设计工作,对吧?
Speaker 2:
You know, he'll tackle the drop down, hopefully, at some point. It's like we've got the Ive touch there. And so I think that it's interesting to talk a little bit about hardware and how that interacts just in the last couple of minutes.
也许有一天他会搞定那个下拉菜单。我们将迎来 Ive 的魔力。在最后几分钟里,聊聊硬件以及它如何与整体愿景互动,应该很有意思。
With the overarching vision, so the overarching vision in a way you started at the beginning, you talked about AI systems that will act a little bit like employees. I guess for people in their domestic lives, that's more like helpers.
放到更宏大的愿景里——一开始你提到,AI 系统在某种程度上会像员工一样行动。我想对于人们的家庭生活而言,那更像是助手。
We don't generally have many.
我们通常并没有那么多。
Kevin Weil:
We think of it as a super assistant.
我们把它视作一位超级助手。
Speaker 2:
Like a super assistant. And so what is the relationship between That and the need to have a hardware device alongside. I already have a hardware device. It's pretty good. I'm talking to it right now.
就像一个超级助手。那么,这与必须配备一款硬件设备之间有什么关系?我已经有一台硬件设备了,它挺好的,我现在就在和它交流。
Kevin Weil:
It's more the opportunity. As we've said a few times, AI is going to touch every part of our lives and every part of our days in every part of the world.
这更多是一种机遇。正如我们多次提到的,AI 将触及我们生活和日常的方方面面,遍及全球每一个角落。
And that means I think that there's an opportunity to reinvent and reimagine a lot of the services and the products that we use every day. You know, in some cases, there are a lot of products that I use every day that are great.
这意味着,我认为有机会对我们每天使用的诸多服务和产品进行重塑和再想象。有些情况下,我每天使用的很多产品本身就很出色。
They probably need to fundamentally change and they should with AI. And if they're not, they're not taking advantage of all of these amazing new capabilities that we have,
它们很可能需要从根本上改变,并且应该借助 AI 来实现。如果它们没有这么做,就无法充分利用我们现有的这些惊人的新能力,
especially where those capabilities are going to be in 12 months, 24 months, 36 months. So I think there's an opportunity to reinvent and reimagine here. And that's true both on the software side and on the hardware side.
尤其要考虑到这些能力在 12 个月、24 个月、36 个月后将达到的水平。因此,我认为这里存在重新发明与再想象的机会,这对软件和硬件两方面都适用。
So, you know, we have some thoughts about how that might occur. Obviously, Jony has thought deeply about this, and we're excited to see what we can build together. And I'm sure there will be lots of others building in the space,
所以,你知道,我们对这如何发生有一些设想。显然,Jony 对此也有深入思考,我们很期待看看我们能共同构建什么。我确信还有许多人将在这个领域中进行建设,
and that's one of the reasons that we have, like, we put so much care and attention to our APIs and our developer platform. Because, you know, the world is not just OpenAI.
这也是我们如此用心打磨 API 和开发者平台的原因之一。因为,你知道,世界不仅仅是 OpenAI。
The world, there's going to be incredible startups and incumbents and everybody else building really cool things using AI. And we'd like to power it in any case. You know,
在这个世界上,将会有令人惊叹的初创公司、现有企业以及其他各方,用 AI 打造非常酷的东西。而我们希望无论如何都能为其提供动力。你知道,
some of these will be first party products that we build and some will be other products that others build that leverage our models. And, you know, both of those things are really important to us.
其中有些会是我们打造的一方产品,而另一些则是他人利用我们的模型所构建的产品。你知道,这两类对我们都非常重要。
Speaker 2:
I mean, I hear that. I hear what you're saying as well, because I've already started to do Realize the limitations of the phone as the form factor for working with the models.
我明白,我也理解你的意思,因为我已经开始意识到手机这种形态在使用模型时的局限性。
You can't really put in a longish prompt to 03. I'm really reliant on talking to it, and if I'm in a noisy place, that won't work. The idea of having an ambient intelligence around me, I always have.
你无法真正向 O3 输入较长的提示词。我非常依赖与其语音交互,如果周围嘈杂就行不通。关于身边存在环境智能的想法,我一直都有。
I'm an AI model listening in to my meetings and I'm talking to them regularly to do my work. So you start to see the limitations of something that's got the power draw of the phone and the size of the phone and does other things.
有一个 AI 模型在旁监听我的会议,我也经常与它对话来完成工作。于是你就会开始看到一款同时受限于手机功耗、尺寸并且还要做其他事情的设备的局限性。
So that will be a really exciting opportunity and please sign me up for the alpha test well before GA. We've got a couple of minutes. I just want to throw out some questions.
因此,这将是一个非常激动人心的机会,请务必在正式发布前把我加入内测名单。我们还有几分钟时间,我想抛出几个问题。
How far behind are the top Chinese AI firms in core foundation model capability?
中国顶尖 AI 公司在核心基础模型能力方面落后多少?
Kevin Weil:
Not as far behind as they used to be. And I think as US AI labs, we need to be very cognizant of that. I think it's really important that the leading models, the models that we all use,
没有以前那么落后了。我认为,作为美国的 AI 实验室,我们必须清醒地认识到这一点。我觉得,确保领先模型——也就是我们所有人使用的模型——
are models that are built off of democratic principles, not authoritarian ones, and we take that really seriously.
是建立在民主原则而非威权原则之上的,这一点对我们来说非常重要。
Speaker 2:
Is there an AI app out there, whether it's in China or coming somewhere else, that's not built by OpenAI that you quite like and you like to use and play around with?
市面上是否有 OpenAI 之外的 AI 应用,无论来自中国还是其他地方,是你喜欢并常拿来玩耍的?
Kevin Weil:
I mean, I think a lot of the video apps are super fun. I also find Waymo magical. That's my go-to example of the way AI is touching our lives.
我觉得很多视频类应用都非常有趣。我也觉得 Waymo 很神奇。那是我用来举例 AI 如何触及我们生活的首选案例。
Self-driving was like two years off for 10 years and now suddenly it's here and it works and it's going to change a lot.
过去十年里自动驾驶似乎总是还差两年才实现,但现在它突然就实现了,而且效果很好,将会带来巨大变革。
Speaker 2:
It's absolutely magical. You are a keen runner and I'm curious about whether you have a Garmin or a Suunto and what you would want from your exercise tracker that AI could bring that it isn't today.
这确实非常神奇。你是一位热衷跑步的人,我很好奇你用的是 Garmin 还是 Suunto?对于你的运动手表,你希望 AI 带来而现在尚未实现的功能是什么?
Kevin Weil:
Ooh, that's a good question. Actually, I have an Apple Watch that I mostly use. And then if I'm doing like a hundred mile race or something, this doesn't quite have the battery. So I'll use a Garmin. What do I want out of it?
哦,这是个好问题。实际上,我主要用的是 Apple Watch。不过如果我要参加百英里之类的比赛,它的电量就支撑不了,所以我会用 Garmin。我想从中得到什么?
I think actually one of the things that I would love is better coaching, just like a little bit. And I think the AI is totally capable of doing it. I think Strava has some things that they're working on around this.
我觉得我真正想要的一件事是更好的指导,就再多一点点。我认为 AI 完全可以做到这一点。我知道 Strava 正在这方面做一些尝试。
I would love to see better coaching and AI-analyzed workouts and things like that. I think it's possible to get the kind of analysis that you would get from a professional coach today from an AI for most users.
我希望能看到更好的教练功能,以及由 AI 分析的训练数据之类的东西。我认为,对于大多数用户来说,通过 AI 获得专业教练级别的分析是可行的。
And I feel like that's the kind of thing that five years from now we're going to be like, oh yeah, I can't even imagine when that didn't exist. But it's only sort of peeking through right now.
我觉得这就是那种五年后我们会说“哦天哪,当时居然还没有这个”,但眼下它才刚刚显露头角。
Speaker 2:
Just a little bit, but I do hear you. I think the possibility of having that personalized coaching would be quite sublime. So last question, when are you going to ship AGI?
稍微有一点点,但我明白你的意思。我认为实现那种个性化教练的可能性将非常棒。那么最后一个问题,你们什么时候发布 AGI?
Kevin Weil:
We're working on it. Every day we get a little closer.
我们正在努力。每天都更接近一点。
Speaker 2:
Every day. I mean, when will we know? Will we know?
每天都在进步。我是说,我们什么时候会知道?我们能知道吗?
Kevin Weil:
I think, look, I think it's one of those things that we talked about intelligence being multifaceted. There are already a bunch of places today where AI is way better than a human.
我想,你看,智能是多维度的,我们也讨论过。如今已经有很多领域,AI 远胜于人类。
And there are places where AI is like laughably worse than a human. But every, you know, every so every like month or whatever, when there's new models, the baseline creeps up.
但也有些方面 AI 和人类相比简直不值一提。不过大概每隔一个月左右,一有新模型出现,基线就会提升。
And more and more things, AI is becoming superhuman at more and more things. And at some point, it's going to be superhuman at, you know, substantially everything and we're going to call it. But it doesn't happen all at once.
AI 在越来越多的事情上达到甚至超越人类水平。总有一天,它将在几乎所有领域都超越人类,我们就会把那称作 AGI。但这并不会一蹴而就。
I think it's not like we go to bed one night and there's no AGI and we wake up in the morning and there's AGI. It's an incremental process of AI, you know, Getting better and better at more and more things.
我认为这不是说我们晚上睡觉时还没有 AGI,早上醒来就突然有了。它是一个渐进过程,AI 会在越来越多的领域不断变得更好。
Speaker 2:
Well, with that thought, Kevin, you keep climbing that hill. Thank you so much this morning for giving us your time. It's great to have you.
好的,就带着这个想法继续努力吧,Kevin。非常感谢你今天早上抽时间与我们交流,能请到你真是太好了。
Kevin Weil:
Thank you for having me.
感谢邀请我。