2025-06-30 Dylan Patel.GPT4.5's Flop, Grok 4, Meta's Poaching Spree, Apple's Failure, and Super Intelligence

Refer To：《Dylan Patel: GPT4.5's Flop, Grok 4, Meta's Poaching Spree, Apple's Failure, and Super Intelligence》。

Transcript

Meta's AI Strategy and Organizational Challenges

(00:00:00)

The episode introduces Dylan Patel, an expert in AI, and begins by discussing Meta's AI endeavors, specifically focusing on the performance and delays of their LLMs, including LLaMA. Patel notes that Behemoth might not be released due to training problems, while Maverick is decent but surpassed by other models. One model was a rushed response to DeepSeek and suffered from sparsity issues, with tokens not routing to certain experts, indicating training flaws. The conversation shifts to the organizational challenges within Meta, suggesting that despite having talent and compute, the lack of a strong technical leader to choose the best ideas hinders progress. Patel emphasizes the importance of "taste" in AI research, highlighting that even with brilliant researchers, poor decision-making can lead to wasted effort on bad research paths.

(00:00) Matthew Berman:

Super intelligence reaching it first who you picking and why open AI He's the guy the chip industry reads before making a move meet Dylan Patel He's a quick thinker with a depth and breadth of knowledge. That is unrivaled in AI, you know for one scale AI is like a.

(00:18) Dylan Patel:

It's kind of cooked.

(00:19) Matthew Berman:

And today, Dylan's answering the tough questions. What went wrong with GPT4.5? In general, it's not that useful and it's too slow.

(00:28) Dylan Patel:

You zoom back, right? It's like, if you believe super intelligence is the only thing that matters, then you need to chase it. Otherwise, you're a loser. It's not the money. It's more the power.

(00:35) Matthew Berman:

What do you think is going on at Apple?

(00:37) Dylan Patel:

They hate NVIDIA, maybe for reasonable reasons. Models are just like pansies about like giving me data.

(00:42) Matthew Berman:

You're using O3 day to day, even though there's, you know, it takes so much time to actually get your response back. The model I go to the most is either 50% of white-collar jobs could disappear.

(00:52) Dylan Patel:

Generally, people work less than ever before. The average amount of hours worked 50 years ago was way higher. And then eventually, there just won't be humans in the loop, right?

(01:00) Matthew Berman:

I don't believe that.

(01:01) Dylan Patel:

It's an art form to some extent. What is worth researching and what's not? GPUs that ended up breaking, it was called BumpGate. It was a very interesting thing. You don't or you do? No, I don't. Okay, so this is a very fun story, right?

(01:15) Matthew Berman:

All right, Dylan, thank you so much for joining me today. I'm really excited to talk to you. I've seen you do a number of talks. I've seen you do a number of interviews. We're going to talk about a whole bunch of things. The first thing I want to talk about is Meta. Let's start with LLM4. I know it's been a little while in the AI world since that kind of released, but there was a ton of anticipation. It was good, not great. It wasn't kind of world changing at the moment. And then they delayed Behemoth. What do you think is going on there?

(01:41) Dylan Patel:

Yeah, so I mean, It's funny, there's like three different models and they're all quite different. So Behemoth got delayed. I actually think they might not ever release it. I'm not sure. There's a lot of problems with it, the way they trained it, some of the decisions they made don't pan out. And then there's Maverick and Scout, right? And so one of those models is actually decent. It's pretty good. It wasn't the best on release, but it was comparable to the best Chinese model on release. But then, you know, Alibaba came out with a new model. DeepSeek came out with a new model, so I was like, okay, it's worse. The other one was objectively just bad.

(02:15)

I know for a fact they trained it as a response to DeepSeek, trying to use more of the elements of DeepSeek's architecture, but they didn't do it properly. It was just a rush job and it really messed up because they went really hard on the sparsity on the MOE. But funnily enough, if you actually look at the model, it oftentimes won't even route tokens to certain experts, so it was like a waste of training. Basically, in between every layer, the router can route to whatever expert it wants to, and it learns which expert to route to, and each expert learns its own independent things, and it's really not something observable by people, but what you can see is tokens,

(02:53) which experts do they route to, when they go through the model, And it's just like some of them just didn't get routed to. So it's like you have a bunch of empty experts that are not doing stuff. So there's clearly something wrong with training.

(03:05) Matthew Berman:

Is it like an expertise thing internally? I mean, they have to have some of the best people in the world. And we're going to get to some of their hiring efforts as of late. But like, what? What? Like, why? Why haven't they been able to really do it?

(03:16) Dylan Patel:

I think it's a combination and confluence of things, right? Like, yes, they have tons of talent. They have tons of compute. But the organization of people is always like the most challenging thing. Which ideas are actually the best? Who is the technical leader choosing the best ideas, right? It's like if you have a bunch of great researchers, that's awesome. But then if you put like product managers on top of them, and then there's no technical lead who's like evaluating what to choose, then you have a lot of problems, right? Open AI, right? Yeah, Sam is a great leader and he gets all the resources, but the technical leader is Greg Brockman, right? And Greg Brockman is choosing a lot of stuff.

(03:51)

And there's a lot of other folks, right? Like Mark Chan and others who are like the technical leaders who are like really deciding the like, you know, technically, which route do we go down? Because a researcher is going to have their research, they're going to think their research is the best. Who's evaluating everyone's research and then deciding that idea is great. Let's use that one. That one sucks. Let's not use that one. It's just really difficult. So when you end up with researchers not having a leader who is technical and can choose, and really, you know, choose the right things, you end up with, great, we did have all the right ideas.

(04:23)

But part of AI research is that you have all the wrong ideas, too. And you learn from them, and you have the right ideas, and you choose them, right? And now, what if what happens if you're choosing of them is really bad, and actually, you choose some wrong ideas? And then you go up the, you know, the branch of sort of research, right? Like you've chosen this bad idea. This is something we're going to do. Let's go further down. And then now you're like, oh, branching off of this bad idea. There's a lot more research because, you know, it's like, well, we're not going to go back and undo the decision we made. Right. So everyone's like, oh, we made that decision.

(04:54)

OK, let's see what's researchable from here. And so you end up with like this like potentiality of like great researchers wasting their time on bad paths. Right. And there's sort of this like The thing that researchers talk about, which is taste, right, which is very funny, right? You think like these are like nerds who won like the International Math Olympiad and like that's their like, you know, claim to fame. But when they were like a teenager, then they got a job at OpenAI or whatever at 19 or Meta or whatever. But there's actually a lot of taste involved, right? It's an art form to some extent. What is worth researching and what's not?

(05:28)

And it's an art form of choosing what's the best because you're making all these ideas down here on the scale here and then all of a sudden you're like, yeah, let's now, you know, great. Those experiments were all done with like 100 GPUs. Awesome. Now let's make a run with 100,000 GPUs with that idea. It's like, well, things don't just translate perfectly. So there's a lot of taste and intuition here. It's not that they don't have good researchers. It's that like who's choosing the taste, right, is difficult, right?

You know, it's like, you don't care about movie critic reviews, you care about Rotten Tomatoes, you know, audience score, perhaps, right?

(05:58)

And it's like, it's like, who's the critic that you're listening to, though, right? And that's, that's, it's challenging to, you know, even if you have great people to actually have good stuff come out, because of organizational issues, because the right people aren't at the right spot and decision makers. And maybe the wrong person gets to like be political and have their idea and research path put into the model, when it's not necessarily a good idea.

Meta's Acquisition of ScaleAI and the Shift Towards Super Intelligence

(00:06:23)

The discussion centers on Meta's acquisition of ScaleAI, primarily for Alexander Wang and his team, as ScaleAI's core business is declining. Patel suggests that Meta's interest lies in Wang's leadership for their super intelligence efforts. This acquisition signifies a strategic shift for Zuckerberg, who previously downplayed AGI but now prioritizes super intelligence. Patel explains that the term AGI has become meaningless, prompting the rebranding to "super intelligence," a move pioneered by Ilya Sutskever. The conversation touches on Zuckerberg's failed attempts to acquire SSI, Thinking Machines, and Perplexity. Patel emphasizes that individuals joining Meta are motivated by the power to influence AI development within a massive company, allowing them to implement their AI visions across billions of users.

(06:19) Matthew Berman:

Okay, well, let's continue down the path of who is making decisions. Obviously, Zuck, last week, there was a lot of news about him giving $100 million offers. I mean, Sam Altman literally said it. They acquired ScaleAI, seemingly for Alexander Wang and his team. He's in founder mode. What does the ScaleAI acquisition actually give Meta? First, let's start there.

(06:41) Dylan Patel:

Yes, I think, you know, for one, ScaleAI is like.

(06:45) Matthew Berman:

It's kind of cooked right now as a company because everybody's canceling their contracts.

(06:51) Dylan Patel:

Google's backing out. I think they're going to spend on the order I've heard like $250 million this year with them and they're backing out. Obviously, they've spent a lot of money and there's stuff they can't back out of, but that's going to go down a lot. OpenAI allegedly cut the external slack connection, so there's no slack between scale and OpenAI anymore.

(07:10) Matthew Berman:

It's the ultimate breakup between companies.

(07:12) Dylan Patel:

Yeah, so obviously these companies are like, I don't want Meta to know what I'm doing with my data, because that's one of the unique aspects of models. It's like, what do you want with your custom data? So clearly scale is not, you know, Meta didn't buy scale for scale, right? They bought Scale for the purposes of having Alex and his few best colleagues. There's a few other folks at Scale who are really awesome as well, and they bought them to bring them over, right? Now the question is sort of like, is the data that Scale has good? Is knowing all the paths of sort of data labeling that all these other companies were doing good? Sure. But more importantly, it's like, We want to get, you know,

(07:56) someone to help us lead this super intelligence effort, right? And Alex is a, you know, he's the same age as me. He's 28 or 29-ish. Yeah, I think he's around that age. Dependently successful in every way, shape or form, right? People can hate on him if they want, but he's obviously very successful, especially when he convinces Mark Zuckerberg, who's not an irrational person. Very smart to buy his company, right? Like, you know, it's like, and their company was doing, you know, nearly a billion of revenue and is like, let's chase super intelligence, right? Which is very different, right?

(08:25)

If you go look at Zuckerberg's interviews, even a handful of months ago, he wasn't chasing super intelligence, right? He was chasing like AI is good and great, but like AGI is not a thing that is going to happen soon. This is a big shift in strategy in that he's basically saying, yeah, yeah, super intelligence is all that matters. We're on the path there, I believe now. What can I do to catch up because I'm behind?

(08:47) Matthew Berman:

It seems like the narrative throughout all of these major companies is now super intelligence, even when it was AGI just a month ago. Why the transition, by the way?

(08:57) Dylan Patel:

The word AGI has no meaning anymore.

(08:59) Matthew Berman:

It's amorphous, yeah.

(09:01) Dylan Patel:

You can look at an anthropic researcher in the face and be like, what does AGI mean? And they literally think it just means an automated software developer. And it's like, that's not artificial general intelligence, but that's what they think, right? And like a lot of researchers across the ecosystem.

Ilya saw this, you know, Ilya saw everything first, obviously, Ilya Sestover. And he started his company, Safe Super Intelligence, right, SSI. And I think that started the rebranding and now like, you know, many months later, it's like almost a year now, a year later, I think it's like nine months to a year later, everyone's like, oh, super intelligence is a thing.

(09:35)

So another research direction that Ilya got first, whether it was like, Whether it was like pre-training scaling or like, you know, the original like vision networks, right? Pre-training scaling, reasoning, right? All these things that he sort of had the idea at least, if not first, among the first and worked on it a lot. You know, sort of Ilya's got this one too, which is the rebranding. So maybe he's got marketing too.

(09:59) Matthew Berman:

Yeah, well, Zuck, at least rumored, tried to acquire SSI and was rebuffed by Ilya, right? I wanted to also ask you about Daniel Gross and Nat Freeman. I think it's rumors, maybe confirmed at this point, but it seems like Zuck is trying to hire them as well. What did those two folks give Zuck?

(10:17) Dylan Patel:

Zuck tried to buy SSI. He also tried to buy thinking machines, like these are rumors. He also tried to buy Perplexity. These are all in some of the media, right? He tried to buy all of these companies, but specifically like some of the rumors that have been floated around is basically that like Mark tried to buy SSI. Ilya obviously said no because he is like committed to super intelligence and straight shotting it, right? Not like worried on products and he's probably not even that money focused, right? He's mostly focused on like building it, right? A true believer in all respects and regards, right? So obviously he probably was like no.

(10:54)

I don't know what the makeup of equity is there but Ilya's He's probably got strong enough votership and ownership to be like, no. And if the rumors are true about Daniel Gross, then like Daniel Gross probably was the one wanting. The acquisition, right? He's like, yeah, this is awesome, right? Yeah, another founder. And he comes from, you know, not an AI research background, although he is technical to a degree. But like, you know, it's like, he, you know, he had his venture fund with Nat and he then he founded SSI with Ilya and he probably wanted the acquisition. And then it's like, well, I was pushing for an acquisition and it didn't happen.

(11:31)

And, you know, I'm just guessing like, you know, If he's going at all, I don't actually know if he's going at all. It would make sense that that's a chasm and split and he's going. I think generally when you look at really a lot of people who are very successful, it's not the money. It is the money always, but it's more the power. And if you ask like, you know, anyone going to Meta, a lot of them will obviously be going for money, but a lot of them are going because now they have control over the AI path for, you know, a trillion dollar plus company and They're right there talking to Zuck and they can convince one person who has full voting rights over the entire company.

(12:18)

There's a lot of power there and they can implement across billions of users whatever AI technology they want using the engine of Facebook, whether it be infrastructure or researchers or product to push whatever AI product you want. That would make a lot of sense to me for an Alex Wang or a Nat Friedman or a Daniel Gross who are They are much more product people, right? Like Nat doing GitHub Copilot, he's a product person, right? He's not an AI researcher, although he knows a lot about AI research, he's a product person, right? And same applies to sort of like Alex, like obviously he's very well versed with the research,

(12:56) but his super skill set is product and people and like convincing people and organization probably not as much the research. That's sort of the angle there is like they've got like all this power to do a lot at Meta.

Meta's Hiring Strategies, Microsoft's Relationship with OpenAI, and the Pursuit of Super Intelligence

(00:13:10)

The discussion explores Meta's strategy of offering substantial bonuses to retain top AI researchers, questioning whether money alone can foster a strong culture. Patel argues that the pursuit of super intelligence is now a primary motivator, driving companies to acquire talent and teams. He then analyzes the complex relationship between Microsoft and OpenAI, highlighting the unique deal structure involving revenue shares, profit guarantees, and IP rights until AGI. Patel points out potential risks for OpenAI, including Microsoft's access to their IP and the ambiguity surrounding the definition of AGI. He notes that OpenAI has diversified its compute resources, partnering with Oracle and others, after initially being exclusive to Microsoft. Patel emphasizes that OpenAI's need for continuous funding makes these complex deals challenging to navigate.

(13:10) Matthew Berman:

Sam Altman also mentioned that Meta has been giving $100 million bonus offers to their top researchers. Apparently, none of the top researchers have left. I want to ask, is that a successful strategy just to like throw money at the problem, get the best people in? It feels like maybe the cultural element would be lacking there where, you know, give as much shit as you want to open AI and say, I'm all in. But there are a lot of true believers there in what they're doing. Is that enough to just throw money and get the best researchers where that culture is going to be built?

(13:42) Dylan Patel:

You zoom back, right? It's like, if you believe super intelligence is the only thing that matters, then you need to chase it. Otherwise, you're a loser, right? And Mark Zuckerberg certainly doesn't want to be a loser. And he thinks he can build it. He can build Super Intelligence too, right? So then questions like, okay, well, what do you do? Well, then you go and try and acquire the best teams out there, right? Thinking machines, right? All these ex-OpenAI people, but also there's some other folks from, you know, Character AI, GDM, Meta, etc. All these great researchers and people and same with SSI. It's Ilya and the people he's recruited.

(14:14)

Trying to recruit people from these companies or trying to buy these companies. That didn't work out. So now you go with Alex who's tremendously connected and can help you build the team and now you gotta go get the team. Now, what's the difference between acquiring SSI where there's way less than 100 employees. I think there's less than even 50 employees at SSI. And for $30 billion, and like, okay, we just paid hundreds of millions of dollars per researcher. And, you know, 10 billion plus for Ilya, right? Like, that's sort of what you just did. And it's like, well, then you're sort of doing the same thing, right?

(14:49)

As far as Sam is saying that no top researchers have gone, I don't believe that's accurate. I think initially the top researchers definitely did say no. The best researchers, the best people. And you said $100 million. I've heard a number for someone over a billion actually for one person at OpenAI. But anyways, it's a ridiculous amount of money, but it's like, well, it's the same thing as buying one of these companies, right? Thinking machines at SSI don't have a product. You're buying them for the people.

(15:19) Matthew Berman:

If Super Intelligence is the end-all, be-all, A hundred million dollars, even a billion dollars is really a drop in the bucket compared to one Meta's market cap currently and also the total addressable market of artificial intelligence. I want to talk about Microsoft and OpenAI's relationship a little bit. We're well past the honeymoon phase it seems. It definitely seems to be a.

(15:44) Dylan Patel:

This is now a therapy show.

(15:47) Matthew Berman:

Yeah, absolutely.

(15:48) Dylan Patel:

Tell me about your feelings, Sam and Satya.

(15:51) Matthew Berman:

Well, this is therapy, right? These are two people and they have a relationship and it does seem to be folding a bit. OpenAI's ambitions seem to have no bounds. Is Microsoft thinking right now, they want to restructure the deal, OpenAI does, Microsoft really has no reason to but like what do you think is going on at like what do you think about the dynamics of this relationship going forward?

(16:14) Dylan Patel:

Like OpenAI would not be where they are without Microsoft and and Microsoft signed a deal where they get tremendous power. It's a weirdass deal because like OpenAI wanted to be a nonprofit and they cared about AGI but at the same time they Had to give up a lot to get the money. But at the same time, Microsoft didn't want to run into antitrust stuff, so they like structured this deal really weird, right? Which is like, there's like revenue shares, and there's like profit guarantees, and there's like all these different things, but nowhere is it like, oh yeah, you own X percent of the company, right? I think it's like, they get like, it's off the top of my head,

(16:48) but I think it's like 20% revenue share, 49 or 51% like, It's like profit share up until some cap and then there's like Microsoft has the IP rights of all of OpenAI IP until AGI. And it's like all of these things are just like nebulous as hell, right? It's like, I think the profit cap might be like 10x again. I'm like going off the top of my head. It's been a while since I looked at it, but it's like if Microsoft gave roughly $10 billion and OpenAI has, it's a profit cap of 10x, it's like, well like what incentive does Microsoft have to renegotiate now If they get $100 billion of profit from OpenAI, and until then OpenAI has to give them all their profit, or half of their profit, right?

(17:30)

And they get this 20% rev share. And they have access to all of OpenAI's IP. Until AGI, but what is the definition of AGI? Theoretically, OpenAI's board gets to decide when OpenAI hits AGI, but then if that happens, Microsoft will just sue the shit out of them, and Microsoft has more lawyers than God. So it's like this just crazy-ass deal. I think there's a few really worrisome things in there for OpenAI. One of the main things they got removed already, because Microsoft was really scared, I think, about antitrust aspects of this. Which was that OpenAI had to exclusively use Microsoft for compute. They backed off of this last year and it got announced with the Stargate deal this year, right?

(18:15)

Which was that OpenAI's gonna go to Oracle and SoftBank and Crusoe and the Middle East to build their Stargate clusters, right? Their next generation data centers. They're still getting a bunch from Microsoft, of course, but a bunch from Stargate. And so, you know, are from Oracle primarily, but the others as well. Whereas before it was that OpenAI could not do that without going directly to Microsoft, right? OpenAI tried to go to CoreWeave initially, but then Microsoft sort of inserted themselves in the relationship like, no, you're exclusively using us. So a lot of GPUs get rented from CoreWeave to Microsoft to OpenAI. But then this like exclusivity ended.

(18:55)

And now like CoreWeave has big deals signed with OpenAI and Oracle has big deals signed with OpenAI.

(19:00) Matthew Berman:

What did Microsoft get? In that exchange where they're going to give up the exclusivity, did they get anything or was it reported that they got anything in exchange for that? Usually it's not just like, okay, cool, we'll give that up.

(19:10) Dylan Patel:

What's been reported is just that they gave up the exclusivity and in return all they have is a first right of refusal. Anytime OpenAI goes and tries to get a contract for compute, Microsoft can provide that same compute at the same price.

(19:24) Matthew Berman:

Reduce risk from antitrust.

(19:27) Dylan Patel:

Yeah, like antitrust is one of the biggest considerations there, but there's other considerations of course, but like antitrust being one of the big ones because like being the exclusive compute provider to OpenAI is a little iffy. And from OpenAI's perspective, They were just really annoyed that Microsoft was way slower than they needed them to be, right? They just couldn't get all the compute they needed. They couldn't get all the data center capacity, etc. Core, even Oracle, are moving much faster. But even they are not as fast and so OpenAI is turning to other folks as well, right?

(19:58)

There's that butting of heads there, but nowadays like the real challenging thing here is like, Microsoft has the Monorepro. It has the OpenAI IP. They have rights to it all. They can do whatever they want with it. Now, whether it's like Microsoft playing nice and not doing stuff with it or being somewhat incompetent and not being able to leverage it and mostly just like looking through it, whatever the reason is, Microsoft You know, despite having access, hasn't done a ton, but the possibilities are endless, right? Like of what could be done. Then the other thing is like if you're truly like AGI or now super intelligence-pilled,

(20:36) you have all the IP up until a super intelligence is, let's just say achieved. But that would imply that like the day before super intelligence is achieved, you have all of the IP and then it gets cut off. But you have all the IP up until right there. So it's like one day of work. Maybe it's hard and it takes, 10 days of work instead of one. Or maybe you achieved super intelligence, but it takes some time to get to the deliberations and agreement that you've achieved super intelligence slash all the evidence that you've achieved super intelligence. But like you've claimed it at this date, but like the model that is super intelligent is here. Like you've made it here.

(21:09)

Microsoft has access to it, right? So sort of that's the real big risk or sort of to the super AGI-pilled Folks, the profit share and all this is like very clean and difficult. And most people don't care that much during when they're investing in open AI. It is challenging to get every investor in the world to be like, yeah, you're crazy ass structure, nonprofit for profit, all this sort of stuff. Okay, that's fine. Oh, Microsoft has rights to all of your profit for a long time and all your IP. So theoretically, you could be worthless if they decide to just like take some of your best researchers and implement everything themselves. Oh, wow, right?

(21:44)

Like these sorts of things scare investors and Sam said himself, OpenAI is going to be the most capital intensive startup in the history of humanity. The valuation is going to keep soaring because of what they're building. OpenAI has no plans to produce profit anytime soon. They've been around for so long and they're doing like $10 billion of revenue and they're still not going to do profit for another five years. And by then their projections of revenue are like well north of like there are hundreds of billions if not trillion dollars of revenue that they expect themselves to have before they ever turn a profit.

(22:18)

And so that whole way through they're gonna be losing money and they need to keep raising money and they need to be able to convince everyone who's an investor in the world and like these things are dirty, right? Like they're not clean and easy to understand.

The Failure of GPT-4.5 and the Importance of Data in AI Training

(00:22:29)

The conversation shifts to GPT-4.5 (Orion), which was ultimately deprecated. Patel explains that it was a bet on full-scale pre-training but proved too slow and expensive. Despite being smarter, it wasn't as useful as other models. The model suffered from over-parameterization, memorizing data instead of generalizing effectively. A bug in the training code further complicated the process. Patel highlights the Chinchilla paper, which emphasizes the optimal ratio of parameters to tokens in a model. GPT-4.5 failed because it didn't have enough data relative to its parameters. The success of reasoning-based models, like Strawberry, demonstrates the importance of generating high-quality data, contrasting with the bad data often found in synthetic datasets.

(22:29) Matthew Berman:

Okay, so you talked a little bit about compute capacity and specifically with Azure being able to go to CoreWeave and elsewhere. I want to talk specifically about 4.5, GPT4.5. It was deprecated, I believe, last week. This was a massive model.

(22:45) Dylan Patel:

Was it really?

(22:46) Matthew Berman:

It wasn't. Oh, I don't know.

(22:47) Dylan Patel:

I thought it was still available in chat. That's why I was just curious.

(22:50) Matthew Berman:

Oh, maybe they just announced the deprecation, but it was imminent.

(22:55) Dylan Patel:

No, it's still there. But yeah, they announced it. Okay. No, no, they've talked about like no use. There's very little usage of it. So that makes sense.

(23:02) Matthew Berman:

Was the model too big? Was it too costly to run?

(23:05) Dylan Patel:

What went wrong with GPT4.5? Orion, as it was like internally called, what they hoped would be GPT5. They made that bet like in early 24, right? They started training it in early 24. It was a bet on full scale, right? Full scale pre-training. We're just going to take all the data. We're going to make this ridiculously big model and we're going to train it. It is much smarter than 4.0 and 4.1 to be completely clear. I've said it's the first model to make me laugh because it's actually funny. But in general, it's not that useful and it's too slow. It's too expensive versus other models, right? Like O3 is just better. They went pure on the pre-training scaling. Data doesn't scale, right?

(23:46)

So they weren't able to get a ton of data. So without data scaling so fast, right, they have like this model that's really, really big, trained on all this compute. But you have this issue called over-parameterization. Generally in machine learning, if you build a neural network and you feed it some data, it will tend to memorize first. And then it will generalize, right? IE, it'll know that if I ever say the quick brown fox jumped over, it would just know the next token is always lazy, right? It isn't until you've trained it on a lot more data that it learns what quick brown fox even means, what lazy dog is, right? It doesn't actually build the world model, it generalizes.

(24:25)

And to some extent, GPT4.5 Orion was so large and so overparameterized that it memorized a lot. Actually, when it initially started training, I know people at OpenAI were so excited that they were like, oh my god, it's already crushing the benchmarks and we're barely into training.

(24:42) Matthew Berman:

Because some of the checkpoints were just so good.

(24:44) Dylan Patel:

Right, initially. But that's because it just memorized so much. But then it stopped improving. It was just memorized for a long time and it didn't generalize. It finally did generalize, right? Because it was such a big complicated run, they actually had to bug in it for months during the training. Training is usually a handful of months or less, right? It's usually less. And they had a bug in the training code for a couple months that was like a very tiny bug that like was messing up the training. It's funny like when they finally found it, it was like a bug within PyTorch that like

OpenAI had like had found and fixed and they submitted the patch,

(25:22) there's like 20 people at OpenAI who reacted to the bug, fixed reaction with like emojis right on GitHub. Another thing is like they had to restart training, you know, from checkpoints a lot. It's so big, so complicated, so many things can go wrong, right? And so from an infrastructure perspective, just corralling that many resources and putting them together and having it trained, And having it trained stably was really, really difficult. But from another flip side, it's just like, even if the infrastructure and code and everything like that was pristine, you still have this problem of data, you know, you're sort of like, everyone points to the chinchilla paper from 22, I think.

(25:59)

In 2022, Google released a paper called Chinchilla DeepMind. And what it basically said is like for a model, what's the optimal ratio of parameters to tokens? And this only applied to dense models with the exact architecture of Chinchilla model. But it was like, oh, if I have like X flops, I should have this many parameters, this many tokens, right? It's a scaling law, right? Obviously, as you make it bigger, And you apply more flops, the model gets better. But how much data should I add? How much more parameters should I add, right? Now, obviously, so over time, you know, people's architectures change, the exact observations of chinchilla aren't accurate,

(26:35) right, which is that like, roughly, it's 20 tokens per parameter that you want of data that you're training versus parameters in the model, roughly, there's actually a curve and everything, it's more complicated than that. But like that observation is not like identical. But what it is, is that like, as you add compute, you want to add more data and parameters at a certain ratio, Or along a certain curve of like, you know, there's a formula, basically, in an ideal world. And they didn't go there, right? They had to go to way more parameters versus tokens. But this was all early 24 when they started training. You know, all these trials, tribulations, they finally get there.

(27:08)

And I don't remember when they released 4.5. It was last year, right?

(27:11) Matthew Berman:

Yeah.

(27:11) Dylan Patel:

Yeah. But they finally released the model, you know, many months after they start training, after they finish training, pre-training, and then they try to do RL and all this stuff. But in the meantime, different teams at OpenAI figure out something magical, which is the reasoning stuff, the strawberry.

(27:26) Matthew Berman:

So it was like while they've already invested all this, while they're in process of training this massive model, they realized, okay, for a much lower cost, we can get So much more efficiency, so much higher quality out of a model because of reasoning.

(27:39) Dylan Patel:

And if you really like try and boil down reasoning to first principles, you're giving the model a lot more data. Where are you getting this data from is you're generating it. And how are you generating the data? Well, you're creating these verifiable domains where the model generates data and you throw away all the data where it doesn't get to the right answer. Right, where it doesn't verify that that math problem or that code or that unit test was good. So in a sense, it's kind of like, you know, like, you know, looking backwards, obviously, I didn't have the intuition then, but like, looking backwards, the intuition makes a lot of sense that like,

Apple's AI Challenges and the Debate Over On-Device AI

(00:28:13)

The discussion turns to Apple's AI efforts, noting that they are behind in the AI race. Patel attributes this to Apple's conservative nature, difficulty attracting AI talent, and historical aversion to NVIDIA. He recounts the "BumpGate" incident, where faulty NVIDIA GPUs led to a strained relationship. The conversation then explores the concept of on-device AI, with Patel expressing skepticism. He argues that consumers prioritize free and convenient cloud-based AI over the security benefits of on-device AI. He also questions the latency advantages of on-device AI, noting that many valuable AI applications require cloud connectivity to access data and perform complex tasks.

(28:06) Dylan Patel:

well, 4.5 failed, because it didn't have enough data. Also, it was just very complicated and difficult on a scaling perspective, infrastructure wise. And there were tons of problems and challenges there. But also, they shouldn't have enough data. And now like this breakthrough that happened from a different team is generating more data. And that data is good, right? Like a lot of the synthetic data stuff is like bad data. But like the magic of strawberry, you know, of reasoning is that the data is good, the data that you're generating. So it's really like from a first principles basis makes a lot of sense that data is the wall. You know, just adding more parameters doesn't do anything.

(28:42) Matthew Berman:

I want to talk about Apple for a second. I'm sure you have some thoughts on that. Apple is clearly behind. We're not getting much in the way of public models, leaks, anything about knowing what they're doing. What do you think is going on at Apple? Do you think they just made a misstep?

They kind of were late to the game. Why aren't they acquiring companies? What is happening internally, if you had to guess?

(29:05) Dylan Patel:

Yes, I think Apple is like very very conservative company. They've acquired companies in the past, but they never done really big acquisitions.

(29:14) Matthew Berman:

Beats was the biggest one. Yeah, it's a headphone company.

(29:16) Dylan Patel:

Right, but generally their acquisitions have been really small. They do buy a lot of companies. They just buy really, really small companies early. They identify it. Maybe it's a failing startup or it's, you know, whatever it is. They buy these startups that haven't achieved product market fit and aren't like super sexy. As far as like Apple, they've always had problems attracting. In terms of AI researchers, AI researchers like to blab. They like to post and publish their research. And Apple's always been a secretive company. They actually changed that policies to where their AI researchers are allowed to publish. But at the end of the day, they're still a secretive company.

(29:53)

They're still like an old antiquated company. It's like Meta only was able to hire a bunch of researchers and talent because they had a bunch of ML talent already. They've always been a leader in AI. They had this PyTorch team as well. And then they committed to open sourcing a lot.

(30:09) Matthew Berman:

For a while now, they've been open sourcing, yeah.

(30:11) Dylan Patel:

Besides that, who's been able to acquire AI talent? The DeepMind to OpenAI shift, OpenAI being the rival to DeepMind and that whole thing, and a lot of great researchers coming together to form it. And then the Anthropic Splinter Group, And then like Thinking Machine Splinter Group from OpenAI and SSI Splinter Group from OpenAI, right? It's like what companies have actually been able to acquire talent that didn't already have AI talent. It's like Google DeepMind is just like the biggest name in the game and they've always had the highest inflows of AI researchers and PhDs. And then there's like OpenAI and Anthropic who are sort of, and Thinking Machines and SSI, right? It's all OpenAI.

(30:48)

It's hard to get talent to come to you. Now, Anthropic has such a strong culture that they're able to get people, OpenAI is being the leader. Meta, you know, I sort of talked through, it's like, how is Apple gonna attract these best researchers? They're not, right? They're gonna get, you know, not the best researchers. And so it's really challenging for them to be competitive, right? And then there's the whole like, they have a stigma against, they hate NVIDIA. And like maybe for reasonable reasons, you know, NVIDIA threatened to sue them over some patents at one point.

NVIDIA sold them GPUs that ended up breaking. It was called BumpGate. It was a very interesting thing.

(31:28) Matthew Berman:

I don't remember that.

(31:29) Dylan Patel:

You don't or you do?

(31:30) Matthew Berman:

No, I don't.

(31:31) Dylan Patel:

Oh, okay. So this is a very fun story, right? One generation of NVIDIA's GPUs. I'm going to butcher the exact reason because it's been a while since I read about it.

(31:37) Matthew Berman:

How many years ago was this?

(31:39) Dylan Patel:

This was like, this is probably like, 2015, if not earlier. There's a generation of NVIDIA GPUs for laptops, right? And chips have solder balls on the bottom, right? That connect their IO pins to the motherboard and, you know, to the CPU power, et cetera. Somewhere along the chain, supply chain, all the companies, Dell, HPE, Apple, Lenovo, they blamed NVIDIA as far as I understand, but vice versa. NVIDIA said it wasn't their fault. I'm not gonna prescribe blame, but like the solder balls would not like, they were like not, Good enough, right? And so when the temperatures swung up and down, coefficient of thermal expansion, right, different materials expand and shrink at different rates.

(32:20)

The chip versus the solder balls versus the PCB would expand and shrink at different rates. And what ended up happening is because of that different rate of expansion, the solder balls connecting the chip and the board would crack. They would, it was called buf gate. And now the connection is severed. The connection between the chip and the board. So it's called BumpGate. And I think Apple wanted compensation from NVIDIA. I think NVIDIA was like, no. There's this whole thing. Apple really hates NVIDIA because of that and because of this threatening, when NVIDIA was trying to get into mobile chips, because they tried to get into mobile chips for a time period and they failed.

(32:57)

But at one point, they tried to sue everyone over GPU patents in mobile. And so between those two things, Apple really doesn't like NVIDIA. And so Apple doesn't really buy much NVIDIA hardware.

(33:10) Matthew Berman:

They don't really need to anymore.

(33:11) Dylan Patel:

Well, they don't need to in the laptops, of course, but like even in data centers. It's like, well, again, if I'm a researcher, first of all, I'm going to go where the talent is, where I have my culture fit, where the money is. And even in places that have a ton of compute and good researchers, Meta still has to offer crazy money to get people to come over. It's like Apple, one, is not going to offer that crazy money. And also they don't even have compute. And then for inference to serve users, they run it on Mac chips and data centers. It's very bizarre. And it's like, I don't want to deal with all that stuff. I want to build the best models, right? It's challenging for Apple.

(33:43)

Matthew Berman:

Okay, I want to ask you one last question about Apple. They are very big on on-device AI. And I actually really like that approach. Security, latency. What's your take on on-device AI pushing AI to the edge versus having it in the cloud? Is it somewhere in the middle? What do you think?

(34:01) Dylan Patel:

So I think there's... I'm generally an on-device AI bear. I don't like it that much. But personally, I think security is awesome. But I know human psychology, like free is better than – free with ads is better than security. No one actually cares about security. They say they do. The number of people who actually make decisions based on security are very little. I would like privacy and security, of course.

(34:26) Matthew Berman:

Wait, but you said, you know, it's, you like free, but you're not, you're not, that's not analogous to on-device AI, right?

(34:32) Dylan Patel:

No, no. So like, Meta will offer in the cloud for free. And OpenAI has a free tier. You know, Google has a free tier.

(34:39) Matthew Berman:

And it's going to be better than free as in running it on your own device.

(34:42) Dylan Patel:

Right, right. And that's a big challenge with that is that On device is that you're limited by the hardware, right? And so how fast the model can inference is really based upon your memory bandwidth of the chip. And okay, if I want to increase the memory bandwidth of the chip, I spend, you know, 50 more dollars of hardware, I pass on the cost to the customer, it's 100 more dollars for the iPhone. Great, with 100 bucks, I could have like, I could have like 100 million tokens, right? And it's like, I'm not consuming 100 million tokens. Or better yet, 100 bucks, I just save it and Meta will give me the model for free on WhatsApp and Instagram.

And OpenAI will give it free on ChatGPT.

(35:19)

And Google will give it free on Google, right? It's like, it's really challenging from that perspective. And then lastly, I don't agree with the latency standpoint, right? I think there's certain use cases where transformers make sense for latency. Super tiny next word prediction on your keyboard or a spelling. But the AI workloads that are the most valuable to you and I are search a restaurant at this time and go find it from a personal standpoint.

(35:46) Matthew Berman:

Or access to my Gmail, my calendar, that's all in the cloud anyways.

(35:49) Dylan Patel:

Right. Within business, there's tons of use cases, but my data's all in the cloud anyways. For personal, you and I, it's like search restaurant, go through Google Maps, make all these calls, right? Go through my calendar, go through my email. All this data's in the cloud anyways, A. B, if it's more of an agentic workflow in terms of like, Yeah, you know, I'm really feeling Italian and find a restaurant that's between you and I in location. And, you know, like we're thinking about Italian, but make sure they have gluten free options because he's gluten free. You know, find me a restaurant with a reservation at 7 p.m. tonight. Like this is a deep research query. And then you get a response.

(36:25)

It's like, well, that took minutes. Or like, you know, we envision the future where the AI books flights for us. It's like this is not a like book to flight. OK, it's book. It's like book to flight. It's researching. It's finding stuff. And it comes back. But it's going through the web. It's going through cloud, right? Where is the necessity for it to be on device? And because of the hardware constraints, even if it is a streaming tokens thing, Your phone cannot run LLAMA7B as fast as I can query a server, run LLAMA7B and transmit the tokens back to myself, right? And no one wants to run LLAMA7B. They want to run, you know, GPT4.5 or 4.1 or 03 or Claude Opus or whatever, right?

(37:07)

They want to use a good model, right? And those models can't possibly run on device. So it's a really difficult place for like the use cases there with integrated all my data, but It's in the cloud anyways. And it's like, how much of my data does Meta have? Does Google have? Does

Microsoft have? Let me plug into all those. Or in the way Anthropic is doing it, they've done this MCP stuff and they're connecting in. You can connect your Google Drive to Anthropic, right? And it's like, oh, wait, even if I don't have my data with Anthropic, they're still able to connect to it if I give them the rights to. So it's like, where's the benefit of on-device AI truly from a use case standpoint?

The Future of AI: Cloud vs. On-Device, and the Competition

Between NVIDIA and AMD

(00:37:42)

Patel continues his critique of on-device AI, suggesting that it will be limited to low-value applications due to hardware constraints. He envisions a future where wearables perform basic tasks locally, while more complex reasoning occurs in the cloud. The conversation transitions to the competition between NVIDIA and AMD in the AI chip market. Patel acknowledges that AMD is trying hard and their hardware has some advantages, but their software stack lags significantly behind NVIDIA's. He notes that NVIDIA's Blackwell chip is objectively superior and that NVIDIA's ecosystem, including CUDA and inference libraries, provides a better user experience.

(37:42) Dylan Patel:

There's certainly one from a security standpoint, but the actual use case is like.

(37:46) Matthew Berman:

Yeah, I think there's probably an argument for a little bit of both, and it probably does skew in terms of the total workload towards cloud, but I think there's an argument for doing at least a portion of the workload on device, anything that you're interacting with the device on. You mentioned typing ahead, and that makes a lot of sense.

(38:05) Dylan Patel:

I do think AI will make its way on device. I think it'll just be very low value AI where the cost structure is just so low. I don't think people should design hardware on phones for AI that's going to make it more expensive. If you're going to keep the phone the same price point, add AI capabilities, great. But if you're going to increase the price point, I don't think consumers will do it. How AI on device really will make sense is like, for example, a wearable, an earpiece or smart glasses. And there you're doing small bits and pieces locally, right? Image recognition, hand tracking, but the actual reasoning and thinking is happening in the cloud, right?

(38:46)

And that's sort of the view that sort of like a lot of these wearables are pushing. I think there will be Some AI on devices, obviously everyone's going to try. It's not like Samsung and Apple and like all these companies are going to sit on their hands. They're going to try stuff. I just think the stuff that's actually going to drive user adoption and revenue and improve customers' lives is going to be skewing towards what's on the cloud, which is why Apple has this strategy, right? Apple's building a couple massive data centers, right? They're buying hundreds of thousands of their Mac chips and putting them in data centers.

(39:19)

They hired Google's head of rack architecture for the TPU, Andy, to make an, they're making an accelerator, right? They see cloud as like where AI needs to go. They just like also have to push it on device, but like even Apple themselves, although they won't say it, wants to run a lot of this in the cloud.

(39:35) Matthew Berman:

And they do have the, they have great chips to do that too. Okay, let's speaking of chips, let's let's talk about Nvidia verse AMD. I have read a couple articles out of artificial analysis. Sorry, semi analysis. Lately, that We have kind of said that these new AMD chips are actually really strong. Do you think AMD, with their new chips, is that enough to really tackle the Cuda moat? Or are they going to start taking market share from NVIDIA?

(40:05) Dylan Patel:

So I think it's a confluence of things, right? So AMD is trying really hard. Their hardware is behind in some factors, especially against Blackwell. But there are some ways their hardware is better, right? And I think the real challenge for them is, like you mentioned, software, right? Like the developer experience on AMD is not that great. It's getting better. You know, we've asked them to do a lot of things to change it, like specific fixes and changes on CI resources and all these other things. You know, there's a long list of recommendations. We provided them in December and again more recently. And they've implemented a number of them, a good number of them.

(40:48)

But it's like, there's just so there's so far behind on software. It's incredible. Now, are they going to gain some share? I think they are going to get some share, right? They had some share last year, and they're going to get some share this year. The challenge is like, versus NVIDIA's Blackwell, it's just objectively worse, right? As a chip.

(41:06) Matthew Berman:

Oh, the chip alone, not the ecosystem.

(41:09) Dylan Patel:

The chip alone.

(41:10) Matthew Berman:

Okay.

(41:10) Dylan Patel:

Because of the system, right? Because NVIDIA is able to connect, network their chips together because of the networking hardware they've put on their chip. With NVLink, right? So the way that NVIDIA can build their servers is like 72 of them work together really tightly, whereas AMD currently they can only have eight of them work together really tightly. And so this is really important for inference and training. And then NVIDIA's got this software stack. It's not just CUDA, right? Like people talk about it's just CUDA, but a lot of people don't touch CUDA, right? Most researchers don't touch CUDA. What they do is they like Call PyTorch, and then PyTorch calls down to CUDA,

(41:44) and automatically it runs on the hardware, whether it's compile or eager mode, whatever you're doing. It generally just maps to NVIDIA hardware really well. In the case of AMD, it doesn't as well. And now, even less than that, so many people aren't even touching PyTorch. They're going to VLLM or SGLANG, which are inference libraries. They're downloading the model weights off of Hugging Face or wherever. And they're plugging it into this inference engine, which is an open source repository on GitHub, either SG Lang or VLM. And then they're just saying go. And then those things are calling, you know, Torch compiled. And those things are calling CUDA and like, you know, or Triton.

(42:20)

And it's just like, there's like all these libraries down the stack. Really, the end user just wants to use a model, right? They want tokens. And NVIDIA is building libraries here called Dynamo that make this so much easier for the user. And now, obviously, there are people like opening eyes of the world and others who will go all the way down to the bottom. You know, deep seeks and open AIs and metas and stuff. But a lot of users just want to call the open source library, tell them, you know, hey, here's my model weights, run. Here's the hardware, run, right? And here, AMD is trying really hard, but it's still a worst user experience.

(42:52)

Not that it doesn't work, but it's that like, hey, if I want to use this library, it's like for NVIDIA, there's 10 flags. For AMD, there's 50 flags, right, that I can, you know, in each of these flags, there's different settings and it's like, well, what's the best performance? I don't fucking know, right? Like, you know, so AMD, I think, is getting there, right? They're getting there really fast and they're going to get some share. The other aspect is NVIDIA's not doing themselves favors. There's this ecosystem of cloud companies, right? You know, of course, everyone knows about the Googles, Amazons, you know, Microsoft, Azure, right?

(43:22)

Those guys have been building AI chips and NVIDIA's been trying to, and NVIDIA's got AI chips, obviously they've always been in contention for a while. And so NVIDIA, as a response, propped up all these other cloud companies. CoreWeave and Oracle, not propped up, like really prioritized them. Oracle, but there's actually over 50 cloud companies out there, Nebius and Together and Lambda. You just go down the list. There's all these different cloud companies. NVIDIA is really helping, right? They're taking what would have been allocations to Amazon and Google and others and saying, hey, you guys can buy them. Right.

(43:54) Matthew Berman:

Is that to kind of hinder? To lower the play field and make it more of a commodity. Right, right.

(43:59) Dylan Patel:

I mean like you go look at Amazon's margins on GPUs, they're charging like $6 an hour if you were to just rent a GPU without talking to anyone. Right? Which is like the cost to buy an NVIDIA GPU and deploy it in a data center is like $1.40 an hour, right? That's the cost. So then like what's a reasonable amount of profit for the cloud? Maybe $2, maybe $1.75, right? That's what NVIDIA wants. They don't want all the profit being sucked up $6 on Amazon. Now obviously you can negotiate with Amazon and get much lower, right? But like you don't want to just like, yeah, it's just like really tough. So NVIDIA is propping up all these different cloud companies, which is driving down the price.

(44:34)

But now they've made a big major misstep in my opinion. They acquired this company called Lepton, who doesn't own data centers themselves, but they built all the cloud software for reliability, for making it run easily, you know, Storm Kubernetes, all this kind of like scheduling stuff. This is stuff the clouds do, right? Which the big clouds do, the Neo clouds, the new cloud companies that NVIDIA's propping up do. But now NVIDIA's bought this company that does this software layer. And they're doing this thing called DGX Lepton, which is if anyone has a cloud with spare resources, GPs just give them to us and we'll rent them for you. And we'll just give it to us bare metal, no software on them,

(45:09) and we'll add all of this Lepton software on top and rent it out to users. Now the cloud companies are really mad at this because it's like... You're directly competing with me, right? And in fact, I think NVIDIA is also going to put some of their own GPUs on Lepton, potentially, that they're installing themselves. But it's like, you propped us all up, but now you're making a competing cloud. So a lot of clouds are mad. They won't say it to NVIDIA because NVIDIA is sort of God, right? You know, like, you don't mess with God, right? What Jensen giveth, Jensen taketh. But like, they'll tell us. The clouds are really bad, right?

(45:42)

And so, so, you know, there's this like aspect and so there's some cloud companies that are turning to AMD, maybe partially out of AMD paying them partially out of AMD being in being them being mad at Nvidia, but like, Some of these cloud companies are now buying AMD GPUs. And then there's this third thing that AMD is doing, which is they're taking, they're doing sort of what everyone accused, I don't know if you've seen this Corrive NVIDIA fraud nonsense.

(46:06) Matthew Berman:

Yeah, sending revenue back and forth.

(46:09) Dylan Patel:

Because NVIDIA pays them. It's like, yeah, NVIDIA rented one cluster from them.

(46:12) Matthew Berman:

Yeah.

(46:13) Dylan Patel:

Cool.

(46:14) Matthew Berman:

Seems like business as usual.

(46:16) Dylan Patel:

Well, like NVIDIA needs GPUs internally, right? They have to develop their software.

(46:19) Matthew Berman:

I mean, there's like a little something there, but it seems like, yeah, exactly.

(46:22) Dylan Patel:

They invested like a tiny amount of money, right? But whatever. It's like so irrelevant, right? But AMD is actually doing this and like taking it to overdrive. They're getting clusters at Oracle and Amazon and Crusoe and DigitalOcean and TensorWave and they're renting GPUs back from them. So they're selling them GPUs and renting them back. Like it's one thing if core we've like buys NVIDIA GPUs and a small portion of them go to NVIDIA But the vast majority are going to Microsoft for open AI.

(46:52) Matthew Berman:

You're not calling this like accounting trickery though, right?

(46:55) Dylan Patel:

It's not accounting trickery It's like perfectly like the accounting is legal. Obviously you can sell someone something and then rent something from them Like NVIDIA's done this too.

(47:03) Matthew Berman:

They're almost funding the investment.

(47:05) Dylan Patel:

Right, right, exactly. And so this is sort of like, in the case of like Oracle and Amazon, it's like, hey, buy our GPUs. We'll rent them back. You'll see that it's great. And you can actually have some, some of them you won't, we won't rent back. Some of them you'll try and rent to your customers. So that drums up interest. And if it works out, you can buy more, right? This is their, this is their reasoning, right? Or for the Neo clouds, it's like, well, you guys are only buying Nvidia stuff. Why don't you buy our stuff? Here's, here's a contract to get you comfortable. And yeah, here's a portion that you can rent out to other people, right? Like, it's like, this makes sense.

(47:35)

It's to some extent, but it's also to some extent like a lot of the sales are just AMD buying them back, but it's like this fosters really good, it's like really good relations, right? Now, TensorWave and Crusoe who are clouds, they're like, I love AMD, right? Because they're renting GPUs from me and they're selling them to me and they're renting them back and I make a profit off of this and now I can reinvest this in more AMD GPUs or I

have a chunk of AMD GPUs I can rent to other people. These clouds are like, well, fuck, NVIDIA is trying to compete with me anyways. What else am I going to do? So it's like an interesting confluence. I think AMD will do well.

(48:08)

I don't think they'll surge in market share, but I think they'll do okay. I think they'll sell billions of dollars of chips.

(48:15) Matthew Berman:

But if you're advising a company on which chipset to invest in for the foreseeable future, you're saying NVIDIA?

(48:20) Dylan Patel:

Depends on the price you can get from AMD. I think there's a price where it makes sense to use AMD and I think AMD will sometimes offer that price to people. Meta uses AMD a good bit. They also use a lot of NVIDIA. They use a good bit of AMD. For certain workloads where AMD is actually better, when I have the software talent and AMD is giving me a ridiculous price. Yeah, you should do it and that's why Meta does it, right? But like in a lot of workloads, Meta still goes to NVIDIA because NVIDIA is the best.

XAI's Grok and the Future of Labor in an AI-Driven World

(00:48:47)

The discussion shifts to XAI and their Grok model, with Patel acknowledging Elon Musk's marketing prowess. He finds Grok useful for deep research and accessing unfiltered data, particularly on current events. Patel then addresses concerns about job displacement due to AI, arguing that aging populations and a trend towards reduced working hours suggest AI could simply enable people to work less. He expresses excitement about robotics, which could automate tasks that are difficult for AI but undesirable for humans.

(48:47) Matthew Berman:

I want to talk about XAI. I want to talk about Grok 3.5. Obviously, like at least publicly, there's not a ton of information about it. Elon Musk has said it's by far the smartest AI on the planet and it's going to operate on first principles. Is this all puffery? Have they actually discovered something new and unique? Specifically, he asked for divisive but true facts. There's a lot of things that he's doing where it just seems like either he discovered something new or it is pure puffery. What's your take on what's going on?

(49:17) Dylan Patel:

I think Elon is a fantastic engineer, engineering manager, but I also think he's a fantastic marketer. I don't know what the new model will look like. I've heard it's good, but you've heard it's good. Everyone's heard it's good, right? So, you know, we'll see what it comes out. When Grok 3 came out, I was pleasantly surprised because I was expecting it to be a little bit worse, but it was actually better than I expected.

(49:41) Matthew Berman:

Do you use Grok 3 day to day?

(49:43) Dylan Patel:

Day to day, I don't, but there's certain queries I do send to it.

(49:47) Matthew Berman:

What, if you don't mind me asking?

(49:49) Dylan Patel:

Their deep research is much faster than OpenAI, so I use that sometimes. And then sometimes, like, models are just like pansies about, like, giving me data, right, that I want, right? Like, it's like, you know, it's like, sometimes I'm just curious, like, what is the, I like human geography, like the history of humanity, how geography, politics, history, you know, resources, like, interact with each other. And so I like to know about demographics and things like that as well, right? It's just interesting stuff. You know, the town I grew up in, right, is like, it's like on the Bible Belt, it's half black, half white, 10,000 population. But like, the one of the ways I describe it to people is like,

(50:24) well, yeah, it's like where the, where the floodplains used to be in the ocean receded, it's extremely fertile land. And that's when in Georgia, when white settlers settled everywhere randomly happened to one of the more fertile areas, so they were able to have better harvests, and they were able to purchase slaves. And that's why it's a higher black percentage than most of the state. And it's like, that's an insane thing to say. But I like to like, reason about like sort of human geography like this. And like Grok is okay with doing that. Right. So sometimes like, you know, it lets me under and obviously, like, savory is bad.

(50:51)

And like, that's, you know, it's like, but like, just like to understand, like, or like, hey, what, you know, oh, like, invasions from like, the step. To Europe were not because they just want to invade, it's because it was becoming more arid. They were forced off their land. These sorts of things are cool and interesting, or economic history. Why did Standard Oil win versus this other oil company before it got to monopoly levels? These sorts of things are just interesting to learn, but other models will start, if it's the Standard Oil thing, it'll be like, oh, it's a union buster, blah, blah, blah. It's like, no, just tell me what actually happened.

(51:23)

I think Grok can sometimes, Get through the bullshit, but it's also not the best model.

(51:28) Matthew Berman:

So my daily, the model I go to the most is either O3 or Claude 4. You're using O3 day to day even though there's, you know, it takes so much time to actually get your response back.

(51:41) Dylan Patel:

It depends on the topic, but yeah, I think a lot of times I'm okay with waiting. A lot of times I'm not. That's why I use Claude. I use Gemini in work, right? So we feed a lot of permits and regulatory filings through Gemini. We feed a lot of like, it's like really good at long context, right?

And document analysis and retrieval. So we feed a lot of stuff through Gemini in a workplace manner, but like I'm talking about pull out my phone. I want to know something mid conversation or whatever. It's a different model. So Grok, they have a lot of compute. It's really concentrated. They have a lot of great researchers. They've got like

200,000 GPUs already up. And they've purchased a new factory in

Memphis,

(52:23) and they're building out a new data center, and there's the craziness they did with mobile generators. Well, now they just bought a power plant from overseas and are shipping it to the US, because they couldn't get a power plant, a new one in time. So they're doing all this crazy shit to get the compute. They've got good researchers. Clearly, the models are good, and Elon's hyping them up. Maybe it'll be great. Maybe it'll be good.

Will it be opening at eye level, or will it be just slightly behind?

(52:47) Matthew Berman:

Are they doing something fundamentally different? He specifically said rewrite the corpus of human knowledge because too much garbage is in the current foundation models. Obviously he has the X data, which is insane, but it's also really low quality, so it's hard to get through.

(53:07) Dylan Patel:

Oh, that's another area where I use Grok sometimes, current events.

(53:11) Matthew Berman:

Summarizing or giving you the context.

(53:13) Dylan Patel:

When the missiles were happening in Israel and Iran and all this war stuff, you can actually ask Grok and it tells you exactly what's happening way better than a Google search will or even a Gemini query or OpenAI query because it's got access to all this info.

(53:26) Matthew Berman:

So are they doing anything different?

(53:29) Dylan Patel:

I think step function different wise, I don't think anyone is like, you know, like everyone likes to think they're doing different things, but generally people are doing the same thing. They're pre-training large transformers and they're doing RL on top, mostly in verifiable domains, although they're researching how to do unverifiable domains. It's like, oh yeah, they're making environments for the model to play in, but they're mostly code and math, but now they're getting into computer use and all these other things. It's like everyone's doing generally the same stuff, but there's such a, it's also such a challenging problem.

(54:04)

There's many directions to go with it, but I think generally everyone's going the same approach. Even SSI is not, I imagine SSI is doing some different stuff, but I don't even think they're doing that much differently than what I just said.

(54:16) Matthew Berman:

I have kind of two different topics I want to let you maybe choose. Economics, labor, so I want to talk about the 50% of white-collar jobs could disappear. I know you probably read about that. Or non-verifiable rewards, which obviously that's maybe more recent, more on your mind. Do you have any preference?

(54:33) Dylan Patel:

I think the prior is more, I mean, maybe the latter is more interesting for your audience. I'm not sure. But the prior is really interesting, right, in terms of like, everyone's worried about massive job loss, right? Or at least some people are in the AI world. But then the flip side is that like, you know, populations are aging really rapidly. And generally, people work less than ever before, right? Like we make fun of Europeans because they work really a lot less. But like the average amount of hours worked 50 years ago was way higher, right? And 100 years ago is even higher than that. And the amount of leisure time was way less. And the size of everyone's home is way larger.

The Impact of AI on the Job Market, Open Source vs. Closed

Source, and Predictions for Super Intelligence

(00:55:15)

Patel predicts that AI will automate a significant portion of jobs, but the deployment will take time. He observes that the junior software engineering market is already struggling. While acknowledging that AI can increase productivity and enable companies to tackle more problems, he questions where junior engineers will fit in. Patel believes that open source AI will struggle to compete with closed source, particularly if China dominates the field. He expresses hope for a more distributed AI landscape. Finally, when asked to bet on a company to achieve super intelligence first, Patel chooses OpenAI, followed by Anthropic, and then a toss-up between Google, XAI, and Meta.

(55:09) Dylan Patel:

And like the food security is way better. It's like every metric were way better than 50 years ago or 100 years ago. And AI should just enable us to work even less, right? Now, is it gonna be like, there's gonna be psychos like myself and probably yourself as well that work way too much. And then there's gonna be like, Normal people who work way less, right? And obviously the distribution of resources is the challenge though, right? I think that's the big thing. That's why I'm super excited about robotics as well because robotics is like a A lot of jobs that are easier to automate are hardest to automate are robotics-influenced. The stuff people want to do is sit on a computer and be creative,

(55:47) but actually that's one of the markets that's been nuked the hardest is freelance graphics designers. What's the market that's not touched is picking fruit. That's the shit that people don't want to do.

(55:59) Matthew Berman:

It still seems like that's way in the future, even though robotics has been progressing at an insane rate, but it does seem like it's pretty far in the future. Okay, but do you foresee as human productivity increases like crazy, certainly a large swath of tasks will be automated. Do you think humans are going to be managing AI in the future or are we going to be reviewing the output of AI or some mixture of in between?

(56:26) Dylan Patel:

Right now we're in the transition from using models on a chat basis To a longer horizon basis, you know, I mentioned I used O3 a lot because actually there's a lot of longer horizon tasks. Now these longer horizon tasks are 20-30 seconds. In deep research is a dozen minutes, right? Dozens of minutes. Over time, these like interactions with AI will become, obviously there will be an AI assistant that I'm just talking to all the time or will be telling me stuff that is noteworthy. But there will also be just long horizon tasks of like, AI is going to be doing stuff for hours. It's been days before coming back for me to review. And then eventually there just won't be humans in the loop, right?

(57:02) Matthew Berman:

And eventually like- I don't believe that. And what timeline are you thinking?

(57:06) Dylan Patel:

I think timeline questions are ridiculous. I'm generally more pessimistic on timelines. It's not, I don't think this decade for people to, for like 20% of jobs to be automated, I think it's like not, it's like maybe the end of this decade, maybe the beginning of next, for 20% of jobs to be automated, right? There's people saying AGI in 2027.

(57:25) Matthew Berman:

But even reaching the tech doesn't mean the implementation is going to happen at that moment either, right? It's going to take years before we actually are able to deploy it in the field.

(57:34) Dylan Patel:

I think deployment will be really fast. You already see the junior software engineering market is nuked. No one can get a job. You can already see the usage of AI and software development is skyrocketing. And we're not even at automated software development yet. We're just at like.

(57:49) Matthew Berman:

Our company is going to choose to do more things. Are they going to choose to tackle more problems?

(57:53) Dylan Patel:

Yes.

(57:54) Matthew Berman:

So then how do those junior engineers get into the market to begin with then? I spoke to Aaron Levy yesterday and he was like, no, as soon as a team tells me, look how productive we are, where do you think I'm going to invest? I'm going to invest back in that team. We're going to grow that team. Where is the place for junior engineers then?

(58:12) Dylan Patel:

I think that's nice and I agree. I myself, my company does a bunch of stuff, but due to the use of AI, we can do a whole lot more stuff and that makes us more productive. We're able to out-compete the old firms that don't do stuff in the consulting and data space. I still have basically doubled the size of the firm in the last year to 32 now, 33. How many junior software developers am I going to hire? It's like, no, it's like the junior software developer I have, we just cheered her on because she just did 50 commits last week. That's what used to take many more people. It's like how much stuff like there's obviously a lot of software for us to build.

(58:47)

But it's like you know how many people can we like actually like you know add right? And it's like wouldn't I rather have like a senior person that's like commanding a bunch of AIs rather than like a junior person. So it's like sort of like it's challenging. At the same time like hiring young people because they can quickly adapt to the new AI tools. It's a balancing act. I think it's I don't know where the junior software developers would go because I get people pinging me on Twitter and LinkedIn all the time, like, you have a job for me? It's like, no, I don't really. Or sometimes I do, right? But it's tough.

(59:25)

And I don't see the major tech companies hiring junior software developers that much, right? It's just a fact, right? And that's why the market is really bad.

(59:33) Matthew Berman:

So they had to just self-skill up on their own, come with better skills.

(59:38) Dylan Patel:

Or just try and build stuff on their own and show that they're not a junior software developer but they can actually use these tools.

(59:43) Matthew Berman:

That's not for everybody, though.

(59:44) Dylan Patel:

Yeah, it's not. A lot of people just need a job. They don't need to self-start.

(59:48) Matthew Berman:

They don't want to be founders, for sure. They don't want to be solo builders. Even if you're not a founder, they want to have that security.

(59:55) Dylan Patel:

I mean, that's been a problem for me. When I started hiring people, some people need a lot of direction, and I don't have direction to give. I'm like, I need self-starters. Now there's people who can do that in the firm, but it's tough to give people. Some people just need direction and need more hand-holding, at least initially.

(1:00:10) Matthew Berman:

Open source versus closed source.

(1:00:13) Dylan Patel:

The US is going to lose in open source unless Meta gets dramatically better, which they are. I think with a lot of the talent they're hiring, I think Sam is wrong that they're not getting any top researchers. Sam Altman, I think they are. There are some top researchers I know for sure are going there. Maybe not the first people they offered, like the ones that have the highest, highest profile, but there are still some top researchers going there. China is open sourcing stuff only because they're behind. The moment they're ahead, they will stop open sourcing stuff. And at the end of the day, closed source will win. Unfortunately, closed source will win.

(1:00:46)

My only hope is that it's not just like two or three closed source AIs that dominate human GDP, right? Or types of models or companies, right? But rather it's like more distributed than that, but it might not be, right?

(1:01:00) Matthew Berman:

Meta, Google, OpenAI, Microsoft, Tesla, whoever else, you had to pick one company to bet on. Super Intelligence reaching it first. Who are you picking and why?

(1:01:07) Dylan Patel:

OpenAI. They're the first to every major breakthrough. Even reasoning, they were the first two. And I don't think reasoning alone will take us to the next generation. So there's gonna be something else. Anthropic second.

(1:01:22) Matthew Berman:

They're so conservative though. They're so conservative at Anthropic in terms of what they release, what they publish, what they focus on, so much safety.

(1:01:29) Dylan Patel:

I think they're conservative as a weekend a lot. I think they're a lot less conservative than they used to be. The process for launching Claude 4, as far as I understand, was much simpler and easier than the process for launching Claude 3. Whether it's that they're hiring a lot more normies, which they are, or like they recognize that others are just going to release stuff anyways and they should have theirs or whatever it is. I think Anthropic is like loosening up a bit. I think they just have really good people though. And then sort of like third is going to be like it's actually a toss-up between Google XAI and Meta now.

(1:02:04)

I think Meta will get enough good people that they'll actually be competitive too.

(1:02:08) Matthew Berman:

Dylan, thank you so much for chatting with me.

(1:02:10) Dylan Patel:

Thanks for having me.

(1:02:10) Matthew Berman:

I appreciate it. This is awesome, man.

(1:02:12) Dylan Patel:

Yeah, very fun.

(1:02:13) Matthew Berman:

Yeah, you can talk about anything, huh?

(1:02:15) Dylan Patel:

Maybe, maybe.

2025-06-30 Dylan Patel.GPT4.5's Flop, Grok 4, Meta's Poaching Spree, Apple's Failure, and Super Intelligence

2025-06-30 Dylan Patel.GPT4.5's Flop, Grok 4, Meta's Poaching Spree, Apple's Failure, and Super Intelligence

热门主题

Recent Articles

2025-06-30 Dylan Patel.GPT4.5's Flop, Grok 4, Meta's Poaching Spree, Apple's Failure, and Super Intelligence

2022-12-07 Thomas Peterffy.Goldman Sachs 2022 US Financial Services Conference

2023-02-14 Thomas Peterffy.The Bank of America Securities 2023 Financial Services Conference

2017-04-17 Jeff Bezos’s Letters to Amazon Shareholders

2020-02-22 Warren Buffett's Letters to Berkshire Shareholders