Video

AI in Action with Ishan Anand

June 18, 2025

Anthony Campolo

Ishan Anand recaps the AI Engineer World's Fair, covering top trends like coding agents, evals, specs as IP, and the Cursor-for-X playbook.

Episode Description

Ishan Anand recaps the AI Engineer World's Fair, covering top trends like coding agents, evals, specs as IP, and the Cursor-for-X playbook.

Episode Summary

Ishan Anand returns to share his takeaways from the AI Engineer World's Fair, a 3,000-person conference with 18 tracks and over 250 sessions that serves as a crystal ball for AI engineering trends. He identifies three major signals: coding agents (the most packed track), Sean Grove's "Everything Is a Spec" talk arguing that prompts and specifications are becoming the real source of truth in software, and Sarah's "Cursor for X" talk outlining a thick wrapper recipe for building defensible AI products. The conversation takes a useful detour into evals—what they are, why they're the number one pain point for AI engineers, and how they differ from traditional unit tests—before exploring broader themes like social data network effects, the conservation of attractive profits as coding gets commoditized, and the idea that software itself is becoming content as the barrier to building drops. Ishan also discusses his vibe-coded session sorter tool, his GPT-2-in-JavaScript educational project, updates to his Maven course shifting toward JavaScript, and his new Patreon/Memberful offering. The episode wraps with a discussion of MCP skepticism, model preferences across OpenAI, Claude, and Gemini, and practical tips for using multiple models together.

Chapters

00:00:00 - Introduction and Conference Overview

Anthony welcomes Ishan Anand back for a return appearance, and the two catch up on Ishan's work as an AI consultant and educator. Ishan explains his ideal customer profiles—web developers learning how models work, former coders turned engineering managers, and non-technical stakeholders—drawing on his 20 years of engineering and product management experience including his time as CTO at Layer0.

The conversation shifts to the AI Engineer World's Fair, run by Swix and Ben, which Ishan describes as a massive event with 3,000 attendees, 18 tracks, and over 250 sessions. He explains how the conference sits at the intersection of AI research and applied engineering, making it a leading indicator for trends—pointing out that attendees at the earlier summit would have gotten a six-month head start on the MCP wave.

00:07:41 - Conference Scale and Top Signal Talks

Ishan breaks down just how large and overwhelming the conference was, with nine simultaneous tracks making it impossible for any single person to absorb everything. He identifies the three biggest signals: the software engineering agents track (the most packed room, especially the Claude Code session), Sean Grove's "Everything Is a Spec" talk from OpenAI about how specifications and prompts are becoming the real source of truth, and the "Cursor for X" talk outlining a recipe for building thick wrapper AI products.

He also recommends two broad overview talks—the State of AI Engineering survey and the AI trends across the frontier session—for anyone short on time. The State of AI Engineering survey reveals that evals are the number one pain point for practitioners, which leads into a deeper discussion about what evals actually are and why they matter.

00:16:33 - Deep Dive on Evals and Why They Matter

Ishan explains evals as a way to measure the holistic success of an AI product, distinguishing them from traditional unit tests. He covers key concepts including using LLMs as judges, building golden datasets with synthetic data, and the importance of designing evals that are expected to fail in order to map the frontier of model capabilities. A bartender chatbot example illustrates how production edge cases are impossible to fully anticipate during development.

The discussion highlights that evals involve product managers and domain experts, not just engineers, and that companies like Notion AI spend 90% of their time on evals versus 10% on prompting. Ishan frames the core challenge: an LLM inside your product is like having a database that's wrong 5% of the time, creating a land of uncertainty within what used to be deterministic software. He also touches on the eval-driven data flywheel where production feedback continuously improves the system.

00:28:07 - Execution, Moats, and Value Chain Disruption

Ishan explores the conference theme that execution is the real moat, using Cursor's success against GitHub Copilot as the prime example—nobody thought you could out-execute the incumbents on code completion, yet Cursor did it through superior context packaging and workflow integration. He compares the current AI moment to Netscape rather than the iPhone, arguing that unlike the iPhone, most companies recognize how existential AI is.

He then introduces four interrelated themes: social data network effects (where user-generated data creates defensible flywheels), prompts as the new intellectual property, software becoming content as building costs drop, and the growing importance of domain experts over pure technical talent. He ties these together using Clayton Christensen's conservation of attractive profits theory, arguing that as coding gets commoditized, value shifts to specs, evals, and audience-building, and draws parallels to how Uber and Airbnb disrupted their respective value chains.

00:51:10 - MCP Skepticism and Protocol Debate

Anthony relays a skeptical viewer comment calling MCP a lock-in strategy riddled with issues. Ishan responds with a balanced take, acknowledging he's been neutral on MCP and even questioning whether truly capable AI should need a protocol layer at all. He compares MCP to the evolution from web scraping to APIs—not strictly necessary, but clearly cleaner and more practical.

He addresses specific concerns about bugs and the original lack of serverless compatibility, noting that the team has been responsive to feedback and that rapid iteration inevitably produces rough edges. Ishan frames MCP's strongest case as a network effect and discovery channel rather than a technical necessity, while acknowledging emerging alternatives like A2A haven't reached a tipping point. The segment also includes a brief comparison to GraphQL hype, where adoption sometimes outpaced actual use cases.

00:58:25 - GPT-2 in JavaScript, Course Updates, and Model Preferences

Ishan demos his continued improvements to the browser-based GPT-2 implementation, showing how the JavaScript notebook-style interface makes transformer internals accessible without Python or PyTorch. He walks through a live tokenization example comparing how "reinjury" and "reindeer" get parsed differently despite sharing the same opening characters, illustrating why some concepts are still easier to show in a spreadsheet.

He covers updates to his Maven course, which has shifted primarily to JavaScript-based teaching, and discusses his new Memberful-powered Patreon offering gated content including voting boards, office hours, and conference-quality talks. The episode closes with Ishan sharing his model preferences—o4 mini high for daily use, Gemini 2.5 Pro for long context tasks, and Claude Sonnet for coding—and his practice of feeding outputs between different models to get richer results, plus a quick mention of his Cursor-built session sorter tool and interest in trying Claude Code and OpenAI Codex.

Transcript

00:00:03 - Anthony Campolo

And we're live. Back to AJC and the Web Devs with long-time returning guest Ishan Anand. I think you're back for the third or fourth time. How have you been?

00:00:16 - Ishan Anand

Good. I'm even losing count. Thank you for having me again. Always great to chat with you, Anthony.

00:00:23 - Anthony Campolo

Yeah, I think we had you on around this time last year because we have this AI Engineer World's Fair. Is that what they're still calling it?

00:00:33 - Ishan Anand

That is. Yeah, it's the AI Engineer World's Fair, run by Swix, who a lot of folks in web development probably know from his JavaScript days. He works with Ben, and they put together this AI Engineer World's Fair every six months.

Actually, they have a smaller single-track summit. Then once a year they have the World's Fair version of the summit, and it's a lot bigger, like this one. This year was something like 18 tracks. There was a ton of content, so many tracks that I had to vibe-code a tool to help me figure out what to go to.

00:01:10 - Anthony Campolo

So yeah, that's cool. That's one of the things we'll talk about for sure. How are things in the life of Ishan, and for people who don't know you, how do you describe what you do these days?

00:01:22 - Ishan Anand

Yeah. So I'm an AI consultant and educator. In consulting, I usually advise startups on roadmap strategy. I also do training, hence the education, usually around helping folks who are web developers understand how models work on the inside. That's why I have this JavaScript implementation of GPT-2. I teach that online at Maven. I also run custom seminars for companies and corporations who want to upskill their teams.

The other kind of ICP, or ideal customer profile, for a lot of my training is what I call former coders. So engineering managers or technical product managers who used to code. Maybe they didn't even do web; they did Java or something. But now they need to manage teams that are doing AI and they want a first-principles grounding in how the models work. That's the other thing I do. And then occasionally I'm getting asked to do stuff now for non-technical stakeholders.

[00:02:18] So whether it's the legal team or the marketing team, and they want to have a grounded understanding of how the model works but they don't need all the math and technical stuff, I've started putting together new material around that.

By way of background, I have 20 years of engineering and product management experience, so I've been a software engineer up and down the stack. I've been a startup executive, as a CTO at a company called Layer0, where I worked with Anthony. Then I was an enterprise B2B product manager at a publicly traded company. So I've got that background.

What I usually bring matches my profile of who I tend to teach. It's either former engineers, technical product managers, or startup executives. So that's me in a nutshell.

00:03:02 - Anthony Campolo

What was that ICP acronym?

00:03:04 - Ishan Anand

Ideal customer profile. It's a product management term. If you're in Silicon Valley a lot, you'll hear that, basically.

00:03:10 - Anthony Campolo

I've never heard that term before, and I'm just laughing because it's the same as Insane Clown Posse.

00:03:19 - Ishan Anand

Yeah, that's a little bit different. That's kind of not my style. But maybe Silicon Valley is itself kind of ideal clown posse.

00:03:27 - Anthony Campolo

Which is funny, because that's the second time Insane Clown Posse has come up on this stream. It came up when I was doing an MCP stream with Dev because I was saying how I would keep calling it MPC and I'm like, I need to remember it's MCP, like ICP, Insane Clown Posse.

00:03:43 - Ishan Anand

I think you clearly have Insane Clown Posse on the brain. So I'll give you any TLA with CP in it and you'll go to it. But yeah, that's what ICP stands for. So it's just a way of saying who the target audience is. I'm just used to speaking to a lot of Silicon Valley folks.

00:04:03 - Anthony Campolo

Yeah, it makes sense. Cool, man. Do you want to share your screen? You have some slides, actually, which I thought was pretty cool. Is this a talk you're giving somewhere, or is this just how you like to organize your thoughts?

00:04:17 - Ishan Anand

You know, it's actually a little bit of both. I love spreadsheets, but I also love to use productivity tools. I also like slide software for actually building a primitive PRD or doing mockups. It's a great...

00:04:37 - Anthony Campolo

Word.

00:04:38 - Ishan Anand

PRD is a product requirements document. Thank you. Gotcha.

00:04:42 - Anthony Campolo

You're a man of acronyms.

00:04:45 - Ishan Anand

Yeah, I find it a useful way to organize my thoughts. I actually did put this together because I gave a talk to a bunch of folks from Seattle Foundations, which is an incubator/co-working space in Seattle that started last year. We had a bunch of folks who visited the AI Engineer World's Fair. I think there were 10 or 12 of us who all came down to attend the conference. There's a video posted. We should put the link in the comments afterwards. I'll find it.

We had basically, I think, seven or eight of us talking about our reflections, like five minutes at a time. Then I'm fleshing this out, and I'll actually record it in the next day or so and post it onto my channel as another video that I like to do.

00:05:35 - Anthony Campolo

So real quick, are there any AI slide tools that you've tried?

00:05:40 - Ishan Anand

You know, it's ironic. The best-known one is probably Gamma, but I have not tried it. And Gamma had a talk at the AI Engineer World's Fair. I watched their talk, but I have not tried it yet.

00:05:53 - Anthony Campolo

Okay.

00:05:55 - Ishan Anand

I have tried the built-in PowerPoint tool, and I have been disappointed with it. I'm sure Gamma is probably better.

00:06:04 - Anthony Campolo

Yeah, I tried something called Plus AI because I had to make some slides a couple months ago. It was cool. It gave a good base, a minimal kind of theme, and just took the information I gave it and put it in slides. For someone like you, it probably wouldn't have been very useful. But for me, who just needed something that I could modify and edit, it worked pretty well.

00:06:26 - Ishan Anand

I will check that out. I guess the reason I haven't tried another tool is just force of habit. Part of habit is your existing stock of slides. My class is like 500 slides, and I've done a lot of work to make the animations as smooth as possible.

00:06:48 - Anthony Campolo

Interesting.

00:06:50 - Ishan Anand

So I've got a lot of existing workload there, and I don't have to go through the migration. That's why I tend to lean on the tools I already have, because I usually have a stock of stuff I can lean on. Classic cold start problem, in a sense.

But I will try checking them out. I have checked out some of the AI spreadsheet tools, and as you know, I'm working on trying to build one of my own. So maybe stay tuned. In a future episode, we can go through that for sure.

Okay, so let's talk about... I don't want to make this a lecture, so feel free. I'm going to go casually through this. When I actually record this, it'll be much more compact and I'll go through it quickly. But the first thing about the conference is that it is big.

[00:07:41] It is too big for one person to take in. It's basically three days when you count workshops. There are 18 different tracks. At times there were nine other talks happening simultaneously, which is why later on we'll talk about how I had trouble picking what to go to. There are over 250 sessions. There are 30 side events not affiliated with the conference itself, just other events being thrown. And it's 3,000 people now. It's a great gathering of people in the engineering field.

For perspective, I think the Snowflake conference was going on at roughly the same time, but that was in Moscone, so it was like an order of magnitude bigger.

But it is the kind of place where I had this conversation with one of the other attendees at one of these side events, and they were like, "I could have never had this conversation back home."

[00:08:46] I had been talking with one other person about sparse autoencoders and mechanistic interpretability, and two people randomly walked by and were like, "I want to be part of this conversation because I'm super interested in this stuff."

00:08:59 - Anthony Campolo

Hallway track, as they say.

00:09:00 - Ishan Anand

Yeah, hallway track. And this wasn't even during the conference. This was at one of these side events. These guys were fantastic. They're actually from Germany, and we just had an amazing conversation there. We normally do not get these conversations outside of work.

So that is one of the great things about the conference, to just get that hallway-track feel and get the energy of all the builders and creators. The other thing about the conference, I like to say, though, is it is not a research conference, like NeurIPS. A research conference is all the way over here. There we go in the camera. So you've got research, and then you've got the super-applied but strategy or analyst conferences, like maybe TED AI or something like that, where it's all conceptual. But there isn't so much about actually building stuff there. This is the intersection of engineering and AI together.

[00:09:52] So it's where, I like to say, AI research comes to become a product. This is where the builders meet. And that for me is the really interesting thing, because the reason you should care about the conference, whether you are an AI builder or not, is that it is kind of a crystal ball on the future of AI.

And the example I like to give is from the summit that happened at the beginning of the year. The hallway track information was that MCP was really hot. They had to move the MCP talk to a double-sized room to meet the demand, and it still filled up. Then the other sessions were standing room only. So had you been paying attention to the summit, you would have gotten a six-month head start on the rise of MCP, which now people can't stop talking about.

00:10:40 - Anthony Campolo

So this talk actually, I think in particular I gave to Dev when he was saying that he heard about MCP, was interested in Dev Agarwal, and then when we did our MCP stream, he came on and he's like, yeah, everything I know about MCP, I learned from this talk. And I was like, oh yeah, that was the talk I sent you. So you're totally right.

00:10:58 - Ishan Anand

I mean, it's the place to go. If you want a surfing analogy, it's the place to go if you want to find the wave to ride. They're not necessarily building, but they help catalyze it, and then it's the thing you want to ride. So that's, to me, one of the most interesting takeaways.

And in retrospect, I did not attend the conference the way I keep saying I should attend the conference, which is: I go to one room, and then 10 minutes later, no matter how good the talk is, I go to another room just to see how crowded it is, and then visit as many rooms as I can in that hour block. Some of these talks are only 20 minutes, so I can just get the signal as to what is the most exciting thing and see what those trends are. I did not do that. I really wanted to.

00:11:44 - Anthony Campolo

Yeah, you could probably do a similar thing once they're all online, looking at a view count.

00:11:48 - Ishan Anand

Yes, when they each have individual view counts. They don't always. So they basically slowly release the videos they have.

00:12:01 - Anthony Campolo

Yeah. Because right now there's like an eight-hour-long single video of everything, I think.

00:12:05 - Ishan Anand

Yeah. And so right now, if you want to watch it, you have to watch the whole stream. Or if you go on the site, it's hard to kind of get the early signal unless you go there and you're getting a feel from either talking to people or looking at what rooms are packed.

But that being said, I would say the three themes that emerged... I don't think there was anything that matched the MCP moment. Let me first say that. Now, I wasn't actually there in February. I unfortunately had a family conflict. But my feeling was there wasn't. I actually put a tweet out asking a bunch of people if they thought there was an MCP moment, but I would say there were three things that were signals.

The first one is coding agents. Technically the track is for SWE agents, so software engineering agents. Agents that are writing code, that was their biggest room, as you can see here. And I'd say there's no greater signal at an event that had nine competing tracks for attention at the same time. In particular, this was the Claude Code session. I heard a lot of people talking about it. But I'd just say SWE agents in general, that was the most packed track.

So that's thing number one. My impression from unscientific polling people after the fact is that the other two themes that you'll probably be hearing people talk about six months from now were Sean Grove from OpenAI. He gave a talk called Everything Is a Spec, and we'll talk a little bit more about this in my recap.

00:13:42 - Anthony Campolo

But Sean Grove works at OpenAI now.

00:13:44 - Ishan Anand

Yes. Maybe the same Sean Grove you're thinking of. Or maybe.

00:13:48 - Anthony Campolo

The GraphQL Sean Grove?

00:13:50 - Ishan Anand

I do not know.

00:13:54 - Anthony Campolo

Interesting. I'll have to... Yeah. I had no idea he was working at OpenAI now. Let me just check his LinkedIn to make sure we're talking about the same one. Okay. Yeah. I don't think that's the same Sean. I think it's a different Sean Grove.

00:14:09 - Ishan Anand

Okay. So the theme of the talk was really the importance of specifications, especially when we've got AI writing the code for you. The spec is more important. In fact, this is why evals are now more important. It very much aligns with some of the work I'm doing myself, and I felt like it crystallized a lot of concepts I'd been doing, and I just hadn't articulated as well.

One of the things, for example, I now do when I check in code is I check in the prompt. The description isn't like what happened; it's like, here was the prompt I gave it, and that's the code. One of the things he talks about is that we throw the prompts away. He has this great analogy. I don't have the slide for it, but it's like version-checking in the binary and throwing away the source code. The thing that's actually the source of truth is now the spec.

[00:15:08] And I'll talk a little bit more about this a little bit further on in the talk. The other one that I thought was really good, and that we'll hear a lot of people talking about, was Cursor for X. I already started hearing people talk about that afterwards. I think we'll hear it a lot like you used to hear people say, "This is the Uber for X" now becoming "Cursor for X." And she outlined what that recipe looks like. I think a lot of people will take inspiration from that.

If you're trying to get a broad view of the trends, I thought these two talks were really good. The first one was the State of AI Engineering survey. So they do a survey, just like the State of JavaScript, and they ask a variety of questions from engineers. This one is particularly interesting because it talks about what is the most painful thing, right?

[00:15:58] When you're in product management, what you want to do is find a pain.

00:16:01 - Anthony Campolo

Yeah. We can't really read what it says in the graph.

00:16:05 - Ishan Anand

Let's fix that. I'll do this. Quick slide changes.

00:16:13 - Anthony Campolo

And is the State of AI done by the same people who do State of JS? It looks like the same website.

00:16:18 - Ishan Anand

I don't think so. I think it's done by Swix and Ben.

00:16:25 - Anthony Campolo

Okay, that might be two State of AIs.

00:16:29 - Ishan Anand

This is State of AI Engineering, just to be clear.

00:16:32 - Anthony Campolo

Okay.

00:16:33 - Ishan Anand

So let's get this out of the way. Here we go. This is what happens when you do things live. Okay, so the interesting thing here is the most painful thing about AI engineering, and you see it right here, is evals and evaluation.

Do you think your audience knows what evals are?

00:16:53 - Anthony Campolo

You should explain it anyway, because it's one of those things I think people have a vague sense of. It's like a way to kind of test the outputs of your models. But how that actually happens is not entirely clear. I think to most people, at least, it isn't to me.

00:17:10 - Ishan Anand

Yeah. Let me pull this up. This is from another talk I give, but it's a good explanation for why you need evals. So let's start. Actually, I don't know how much of a tangent you want to go on here, but I'll just use the slide. They measure the success of your product.

That can be stuff that's really amorphous, like whether your chatbot is funny but not offensive, concise but not unhelpful. It's easy to define success as a binary accomplishment, but sometimes it might be something that's very high-level and not exactly clear how you'd precisely define it.

This one is like informative but not hallucinating. How do you determine success in a very holistic way? People often describe these as unit tests for AI. That's not quite true. First of all, unit tests typically are for small pieces. Those pieces can be things like whether this reverses the string properly. You could take that and put it into something else. So they're really good for libraries.

But the purpose of evals is to measure the overall success of the product. Very often they are use-case, domain, or application-specific.

Another difference is that unit tests are usually done by engineers without talking to product managers. But evals are really going to involve domain experts and product managers, ideally as stakeholders as much as the engineers. The engineers may write a bunch of them, but they're going to be thinking product-wise when they craft them to a certain extent. Obviously, if the LLM is not behaving, that's very clear from an engineering perspective, but at a high level it ideally also incorporates things about the product.

So a great example is during one of the talks from Braintrust, he mentioned two good use cases for evals. One is if a new model comes out, like the next version of OpenAI's latest model. You don't know if your model is going to get better or worse in certain types of scenarios. So you need a series of tests in order to measure: did the model get better at this? Did it get worse? Will there be regressions? Can I safely deploy this?

00:19:26 - Anthony Campolo

So this all makes sense to me. But what I'm confused about is how do you scale this beyond just literally having a person sit down, look at every single output, and then rate it from 1 to 10.

00:19:38 - Ishan Anand

Yeah. Great question. The first step is actually human review. You first need to go through and categorize all the failure and success modes as a human and be like, okay, what made this work and what didn't? And then if you want to scale it, what you usually do is collect maybe a golden dataset.

And sometimes you might use synthetic data. You might discover a failure mode and go to another LLM and be like, "Give me examples of more things like this." This is one of the ways you can scale up the data to test the frontier of that model.

00:20:13 - Anthony Campolo

And that was a big conversation when DeepSeek came out, whether it was trained on a whole bunch of ChatGPT outputs, right?

00:20:21 - Ishan Anand

That's a little bit different. That was whether it was trained. So here we're talking about just for evals, and evals are just like tests.

00:20:28 - Anthony Campolo

So that would have been for the input text data, not for the evals.

00:20:32 - Ishan Anand

Yes. That would have been for the input. That was the discussion. There is something called distilling, which is you take the input output pairs from a model and you train against that.

Here we're actually taking the input output pairs to test where the model is, but we're not explicitly training on it because we want to know what the frontier of that model is. And then you can use an LLM as a judge, which is using another LLM to ask, did the model achieve this goal or not? Was it funny? And you'll give it what they call a rubric, which defines what funny is versus what's offensive.

The other key thing compared to unit tests, to put this in perspective, is they're designed to fail. So you don't want to be passing all of your evals. You want to know what the frontier of capability is so that if a new model comes out, you know that you can now use it and, like, oh, this use case suddenly got better.

[00:21:20] And that's not how you typically view unit tests.

00:21:23 - Anthony Campolo

Well, don't they always say you should write some failing tests also to make sure it fails the way you want it to fail?

00:21:29 - Ishan Anand

I have not heard that, but even then, the motivation is to make sure it's failing properly.

00:21:36 - Anthony Campolo

It looks to me like testing, like when you have a certain try-catch and you want to make sure that when it errors, it gives you the right error message.

00:21:44 - Ishan Anand

Yeah, that makes sense. But even then you have a very predictable outcome. You're not testing, right? Suddenly your library starts reversing a string properly like that. That just doesn't happen here and now.

Suddenly the model can reason over something you didn't expect it to reason over, or suddenly it gets funnier. Or take what happened with OpenAI recently where the model got suddenly sycophantic, right? It just started agreeing with you. If you were using that in your API, suddenly that would change the behavior of your chatbot if it was on a website in weird ways. You'd want to be able to catch that.

And then suddenly, if the model gets smarter, you want to understand that. The other good example for this that I think makes it concrete is, like, anytime you get a customer support request, think about it. And this was again in the [unclear] talk. Think about how you might capture that inside an eval.

[00:22:29] So they're really tied to product feedback. And the guy to definitely read on this is Hamel Husain. His thing is like, "Look at your data." You asked, how do you do this without human review? The first step is to look at your data.

Basically, this is a classic data science problem in a way, which is that regardless of whether it's an LLM, the distribution of data I trained my model on during development, or test, or whatever you want to call it, is not going to be the same as the distribution in production. And it's going to include a bunch of things you didn't anticipate.

There was another talk that gave a great example. He said, imagine you're building a chatbot that's going to be a bartender, right? And you test it. When a person orders a drink, it works. You test it when somebody orders a second drink, or two drinks, and it works.

[00:23:12] You test if they order a lizard and it works, and you're like, okay, great. And you deploy it. And then somebody walks in and they said, where's.

00:23:18 - Anthony Campolo

The where's the bathroom? And it blows up.

00:23:20 - Ishan Anand

It blows up, right? So it's like, can you realistically, you know, find what all the different edge cases are?

The other thing that's like more... so this was for another talk. But when you think about a task for AI, whether it fails or not, ask not whether AI can do the task. Ask what eval you want to write for that task so that you can figure out when it can do it and when it can't.

And another talk from the conference was from Notion AI, and they're like, we spend 90% of our time on evals and 10% on prompting. So hopefully that puts it in perspective.

And maybe the last thing I'll just say on this topic for engineers is this distribution thing. When you're building a product with typical, classic development, you think of the outside world and your product, and your product is like this land of determinism. You can control everything inside that. And the outside world is a land of uncertainty, and you have to worry about it when you cross that boundary.

But once it's inside your product and you feel like you've filtered it and tested it against violations or whatever, the system is pretty deterministic. Dealing with an LLM is like having a database that's wrong 5% of the time inside your product. So now you've got this land of uncertainty that's inside your product. So you're kind of fighting two fronts at once in a way that you have to think about software engineering differently.

So that's my, I don't know how long that was, five or 10 minute digression on what evals are. And not surprisingly, sorry. Go ahead.

00:24:58 - Anthony Campolo

No, that was cool. That was good. Useful little tangent.

00:25:01 - Ishan Anand

Yeah. So not surprisingly, evals are like the number one painful thing. In some ways it's another way of saying hallucinations, but it's really more about lack of reliability, which is also here. It's like: I have to look through the data. I have to categorize everything. It's the hard part of turning this amazing but crazy, like, person that is an LLM, or simulation of a person, into something productive. Evals are a lot of how you shape it. And so it's like the number one challenge.

And so not surprisingly, a lot of the folks I talked to were very interested in evals. And a lot of the learnings coming back were around, like, oh, here's how you properly do evals. And evals are really a data flywheel. Because you don't know all the problems, you deploy it, and then you take a look at what the problems are.

And again, not just the engineer, the product manager. So some people do it in their observability tooling. But depending on your product, you might want to have your PMs or QA and other people be able to modify and look at evals.

And then you basically see, oh, this is a problem. You talk to an engineer about fixing it. You generate a new eval for it so that next time you know. And so it's this flywheel loop.

And I'll say this. I've mentioned Braintrust. By the way, if you've seen any of my videos, I typically use Weights & Biases, just because I know Alex and [unclear]. But lots of people who are learning about evals for the first time mentioned, "Oh, I'm going to go check out Braintrust." So if you listen to the Foundations recap, you'll hear a lot of people mentioning that.

So from a sponsorship branding perspective, I feel like they got their money's worth.

[00:26:44] And the other person, ironically, who got their money's worth, just as another aside, is Tambo, who's in Seattle and part of Foundations. But during the interview with Greg Brockman, their logo was right next to his head. And everyone I talked to who knows Tambo noticed that. So it definitely sounds like sponsorship works.

[unclear]

00:27:07 - Anthony Campolo

Your AI assistant copilot or agent, that Tambo?

00:27:11 - Ishan Anand

That's Tambo. That is the one. It's got a little, like, squid logo.

So the other talk that I recommend is the one from AI Trends Across the Frontier. It just talked about the four different areas where AI is changing rapidly: reasoning versus non-reasoning models, open versus proprietary models.

And then very often when we think about models, everyone's very focused on state-of-the-art. But the other important dimensions are cost, and then speed is definitely bifurcated. There are use cases where, like, I do not care, I just need this report and it's due tomorrow or something. And then there are cases where you're inside my product and I need you to give me an answer as quickly as possible.

So it definitely sounds like, from multiple people and multiple talks, speed is kind of bifurcated into "I can wait forever" or "I need it now." And that also dictates which models you're going to pick.

[00:28:07] So those are two really good talks I recommend if you're short on time.

Let's see, before we talk about what's changed, maybe we should talk about what hasn't changed. That's like the classic Amazon question: don't ask what's going to change in the future. Ask what isn't going to change and then focus on that.

And this is back to Sarah's conviction talk on Cursor for X, and she points out that Cursor didn't invent code completion. They didn't invent the model. They just out-executed everyone.

And so she said, execution is the moat. My only, like, hot take on this is execution has always been important. And to say execution is the moat is to basically say there are no moats. It's just competition can come from anywhere. It's drawbridges everywhere.

But either way you phrase it, the point is to be provocative and to get people realizing how important execution is.

00:29:00 - Anthony Campolo

Cursor is a good example, though, because you think about it like the first real use case for ChatGPT or for GPT, even before ChatGPT, was Copilot and VS Code, which should have been how everyone did things. The fact that Cursor could come in and usurp that at all is pretty impressive.

00:29:19 - Ishan Anand

It's exactly the point. I mean, we're talking about the first PM use case before ChatGPT, as you point out. PMF is product market fit. So like the clear adoption case.

And I was talking to one of my friends about this, actually one former coworker from Layer0 that we both worked with. And he was like, "You know, I should have seen this." And I was like, lots of people saw it. Nobody thought you could out-execute the incumbents on this.

If you didn't own the model, you didn't, like, invent it. But this is the important lesson here: with good execution, and what we'll see later in a slide she does called the thick wrapper recipe, that good execution plus that recipe is how you might be able to disrupt an incumbent if they aren't executing fast enough.

[00:30:09] One of the things I like to say is people call ChatGPT AI's iPhone moment, and I don't think that's accurate. I feel like it's actually the Netscape moment.

And the difference was I felt like the iPhone moment, having been around for both of those moments, the iPhone moment caught a lot of people off guard. Like the idea you needed even a mobile website or mobile app, people weren't building them for the first couple of years.

I remember I was, I built, I tried to build an app store for web apps on iPhone on day one, and I was in the App Store with products and the Android Market on day one when both of those got released. And I even remember, for example, like Kayak, they built a mobile app and they wouldn't let, they didn't build the functionality to place an order. And they were like, we were surprised somebody would want to place an order on their phone for a flight, believe it or not.

[00:30:59] Right. So it clearly caught people off guard. Whereas the Netscape moment did not catch most people off guard. I mean, look at Microsoft. They pivoted a ton of the company to kill a company with a minuscule amount of revenue by comparison. But they recognized how existential it was.

And I feel like that's where we're at right now. A lot of folks are realizing that, and if they don't, then they're going to get hit with a Cursor.

So one of the interesting themes, and there are four interesting themes that jump out at me from the conference. The first was what I'm calling social data network effects.

So in classic network effects like Facebook, as more people join the network, the system gets more useful for all participants, and that's a communication improvement. There's now more people you can talk to, but the raw functionality of Facebook didn't need to change for that to happen. It was still basically a communication mechanism.

[00:31:55] What's happening here is that usage by people is creating data that lets you ship a better agent. The usage by users lets you understand the frontier of model and product capabilities. So then you're able to improve it where it's not working, and build a better model or product, which is harder for other people because they don't understand the long tail of the model or product capabilities.

And so, you know, a boring app, to put this in perspective, can gain an AI moat if its workflow is producing proprietary ground-truth data.

And so we haven't seen this as much. Now, this isn't new to AI. A good kind of canonical example is Google's clickstream data, right? The data of if I click on this link, does somebody come back based on this query? And then I go to the link that, you know, nobody came back to, my search results page, then I know this was probably the best result for that particular query, especially at the long tail where, like, looking at PageRank isn't good enough.

[00:33:01] So that clickstream, which by the way is a big topic during the antitrust trial, that's a good example of a social data network. We don't think about Google as a social network, but it actually is. The humans are helping drive the knowledge of the system together with some kind of algorithmic model.

And so I think with AI, we're going to start seeing this a lot more, because it's going to be critical to a lot of products in order to create some kind of moat or data flywheel. So this is from the Windsurf talk. And this is actually something I'd been talking with one of the startups I'm advising about. Like, we need to find the flywheels for this. And this really crystallized it in a way that was easier to articulate.

Another kind of side effect of this, though, is that, you know, AI products are now producing these software artifacts that partially reflect the knowledge of this flywheel. Like, it's not just in the code, it's in the evals, it's in the prompts.

And those are real intellectual property. So I say prompts are the new patents. I don't mean literally that they act as patents. I'm really saying prompts are IP. And in fact, there are a couple folks during talks who said, "We view our prompts as intellectual property."

This gets back to those two really influential talks that I talked about. Everything Is a Spec is really saying your spec captures the knowledge that, in some sense, used to be inside your product manager's, your engineers', or your team's head for what made the product succeed. Now that is also captured in this spec, whether that spec is the prompt or the combination of the prompts and the evals or various other tools.

And these are key ingredients in Sarah's thick wrapper recipe. So this one on the right is that Cursor for X talk. She's like, a big job if you're going to build a Cursor for X is you've got to do five jobs.

[00:34:47] And the first one is just figuring out how to collect and package context. So think about Cursor. You might want to package up context about, hey, what's in their clipboard even though they haven't explicitly put it in, or what were the last three or four edits they made, in order to give the model additional information so it can reason properly about what the task is.

Then you present it in the right format to models, which is not always easy. Then you need to orchestrate the output from them, and you present it back to the user, but in a way that works with their workflow. And Cursor is exactly these five things.

So it's a very important lesson in how you can build what she calls a thick wrapper. And if folks remember, API wrapper used to be a pejorative term. And what we've seen is that if you do the right kind of packaging of context and integrating into workflows, you can actually build a defensible moat partially through this as well as that data flywheel.

[00:35:48] The other thing is, like I mentioned earlier, the software engineering, the coding-agents track, was the most packed track. And clearly code is going to change how it gets written.

But my personal experience with this was I was going through the conference, and we've already talked about how many different talks you could watch at the same time and how you couldn't decide which one to go to. So on the first day, I was just like, okay, let me ask ChatGPT.

So I gave it like three or four of the talks I was thinking about going to and I said, which one should I go to? And I described myself in like two sentences and it told me, and I was like, okay, I'll go. And then I did it for the next block. I couldn't decide. And so by the second day I'm like...

00:36:33 - Anthony Campolo

Do you think it made good decisions for you?

00:36:36 - Ishan Anand

Not 100%, but it helped me refine my choice. And it actually did. The first time I tried it, I should go pull up the ChatGPT log, but it said, "If you are interested in X, then do this. If you're interested in Y, then do this."

And one thing that's not in my tool, but that was useful, is that I could ask follow-up questions. And I said at one point, what is going to give me information I won't be able to get anywhere else? And it said, oh, go to this talk, you're not going to get that information anywhere else.

So in conversation, it was definitely useful to talk through. Half the value might have been talking through with this simulation of a person that is an AI model.

And so I built this tool called Session Sorter, which is literally like: help me decide between these sessions.

[00:37:23] In fact, if you go to this right here, I put it live. You have to put in your OpenAI key and then you pick your track. Oh, it looks like the tracks thing is broken, but I'll fix that. I maybe have to reload it. Let's see. There it is.

And then you're like, I'm short on time. Which session should I cover? There's a happening now button, which doesn't work because the session is already covered, but everything is in local storage so that, like, when Wi-Fi goes down, everything's local. I can still use this. I can see the details, I can see what room it is.

And then I can just go ahead and say, like, I'm short on time, what should I cover to get the broad trends? And then I can ask it. And lately, what I did post-conference is I enhanced this a little bit.

[00:38:08] So right now I've added the watch links for various talks, so I can now click on this and it will open up and deep-link me to the right talk for that particular one if I wanted to see it. So I'm continuing to improve it.

And it's actually on GitHub as well for me to basically catch up to the things I missed. But the main point that I was trying to get to is, let's go back here, play from current slide. No, I don't need that. There we go.

It got picked up by the Daily Brief to kind of illustrate the point of how packed the conference was with content. And to me, I was like, that's really fascinating. First of all, this wouldn't have existed. Vibe-coding: I would not have bothered if I couldn't have gotten it done in the hour between when I woke up and when I was getting into the conference.

[00:39:03] The other thing is, like, he's using it to illustrate a point the same way you might use a photo or a meme. The software is actually becoming content.

So my hypothesis is as the level or effort goes down for building software, we should expect it to become content from influencers and blogs, just like we expect them to put out tweets and blog posts and videos. So I think we should expect content creators to also eventually become software creators. I think Every is a good example of somebody trying to do that right now. I'm trying to do that in my own content creation.

00:39:42 - Anthony Campolo

Every has really good interviews. I've enjoyed his stuff.

00:39:45 - Ishan Anand

Yeah. So I'm working on my Patreon. It's both a mix of software and content.

00:39:54 - Anthony Campolo

So do you have much gated content behind your Patreon?

00:39:57 - Ishan Anand

I just launched the Patreon right at the week of the conference.

00:40:01 - Anthony Campolo

So okay.

00:40:02 - Ishan Anand

I will have, but not yet.

00:40:05 - Anthony Campolo

Where are the tiers?

00:40:06 - Ishan Anand

There's only a monthly tier and then a semiannual. If you sign up for the semiannual, you get my course at a discount.

The gated content is going to be a voting board where people can vote on what they want me to cover, and I'll give them a conference-quality talk on it. And then I'll present the talk and they can ask questions. And so you get to talk to a human who dug in deep on it and answers your questions on what you're really interested in in the AI space.

And then I'll also have office hours and other content like that. I'll also probably be getting some of the software behind that right now. You can run my GPT-2 implementation, but I might do other ones that I might gate as well.

00:40:49 - Anthony Campolo

Is this the Ishan 137 URL, or is that someone else? Oh, yeah. I tried Googling.

00:40:56 - Ishan Anand

Yeah, you just hit Patreon. I mean, I just launched it, so it's right up here at Patreon. And it's technically not Patreon, it's actually Memberful, which is owned by Patreon.

I'm using Memberful because it has better integration with the kinds of things that I want to do around software being content. It's a lot more flexible. And it's not really just about the classic things Patreon is about.

00:41:19 - Anthony Campolo

So interesting. Okay. So then this is the link.

00:41:23 - Ishan Anand

Yes.

00:41:23 - Anthony Campolo

That people want. Let me throw this up on the screen. There we go. That's a mouthful. It'll be in the descriptions as well.

00:41:34 - Ishan Anand

I might just change the name. Spreadsheets All You Need is a mouthful. I'd be curious about your feedback.

Well, the feedback I got was that it's a very unique name. A bunch of people in the know know it because they remember the original post. But I was planning to move to something more...

00:41:54 - Anthony Campolo

I think just using the acronym is fine. You should switch to that just whenever you can.

00:41:59 - Anthony Campolo

Spreadsheets S A N or?

00:42:02 - Ishan Anand

I have the domain, so yeah. Good idea. Yeah.

00:42:05 - Anthony Campolo

S A A Y N, yeah.

00:42:08 - Ishan Anand

So another kind of consequence that came out, like there was a whole track called Tiny Teams, and it was about being able to do more with less people. But really, I think the underappreciated team-building track was this one on building a modern team in the enterprise.

And he spoke about who you should pick when you build out your team. I really like this, this [unclear] wager. He's like, you have a $5 million budget. Who's on your team? You've got nurses, army vets, insurance pros, a designer, and then it's like the top engineer at OpenAI, the number two engineer at OpenAI, the number one engineer at Anthropic.

And so this is, you know, I think it's a really important point, because when people first start getting involved in AI and they don't know much about it, they don't realize how much easier it's getting to build AI. It's becoming easier to build AI products.

[00:43:06] Usually, the focus is not the model. It's on all the other things that Sarah talked about in her talk, more so than it is on the model. And so you may not need as deep technical skills, at least in AI, as you need domain expertise.

And I really like this framing in this slide right here, which is like a weak technical loop means a lack of technical execution, but a weak domain loop would mean you never find product fit.

And one kind of spicy way of rephrasing this is, you know, imagine you're building an AI team or AI product and you've got two resumes on your desk. One is like Andrej Karpathy and the other is Amanda Askell. It's like, who do you hire and why?

Or imagine, you know, you're trying to decide, should I buy Noam Shazeer's company and Noam Shazeer for $3 billion or do I buy, you know, Jony Ive, which OpenAI did for $6 billion. Why do those acquisitions make sense at those prices?

[00:43:59] And so, you know, if you look at all four of these changes, to me there's essentially the same underlying root cause, which is that software, the manufacturing of software so to speak, is undergoing a value-chain disruption. So the pieces you need to put together to build software products are changing. And it used to be that programming, or writing code, was a big part of that.

And there's a theory called the conservation of attractive profits from Clayton Christensen, the same guy who came up with The Innovator's Dilemma, which is that when you've got a value chain and one part gets commoditized, what ends up happening is the value moves to other adjacent parts.

And so what you see is prompts and evals becoming the new piece of IP, and coding is less important. And now the prompts have more value. Hence everything is a spec, right? The idea of software as content is again saying that the actual coding has gotten so easy.

[00:45:06] Same thing with domain experts versus tech experts, and social data network effects, says that the usage of your product and being able to build an audience are now as important, or more important in some sense, than the emphasis you used to put on the code.

And this is from Stratechery. If you haven't read it, it's a fantastic blog from Ben Thompson, but about a decade ago, he categorized this kind of value chain disruption where you had like Uber disrupting taxis and commoditizing and modularizing them.

So originally, cars and dispatch were what taxis integrated together. And these other parts, hailing and payment, were modularized.

And I actually remember this very distinctly because before Uber and Lyft came out, I was at the iPhone dev camp and one of my friends, he won one of the prizes at iPhone Dev Camp with an app called Lyft for hailing taxis.

00:45:58 - Anthony Campolo

Wow.

00:45:58 - Ishan Anand

And I talked to him afterwards, and I asked him, like, what happened to that? I think this was like six months afterward. And he's like, "You know what the problem is? None of the taxi cabs want to work with you. The dispatchers, they have a monopoly and they don't want to break that up, and they don't want to break up dispatch."

00:46:15 - Anthony Campolo

And that's why it took someone like Travis with Uber to kind of, like, just barrel in and do it, like, yeah, essentially usurping the whole taxi industry. And it was very controversial at the time.

00:46:26 - Ishan Anand

Yeah. So, like, I think Clayton Christensen said the iPhone was not disruptive. Some people have said Uber was not disruptive, because from a disruptive-innovation sense, the classic definition of disruptive innovation, the precise business one, not just "disrupt business," is that it was actually serving an underserved market.

An innovation that is clearly better from the user perspective is called a sustaining innovation. And Uber from the user's perspective was clearly better.

And so, you know, a bunch of business theorists said Uber was maybe disruptive to the market, but it was clearly a sustaining innovation. It didn't change the dimensions of it. But I like to say Uber was disruptive from a regulatory standpoint. They came in.

00:47:11 - Anthony Campolo

And a societal standpoint.

00:47:13 - Ishan Anand

A societal standpoint, exactly. And they kind of commoditized trust in a certain way, which is actually how he describes the next slide on Airbnb.

But yeah, they came in and they broke this integration and they said, hey, we can modularize the cars. That's the underused capital, right?

And similarly with hotels and Airbnb, now we can modularize property and even this latent property that isn't being used, when we break apart property and trust, and we can put trust and reservations together into a system, which is what Airbnb does, and then the guests will use that. And now we have this latent capital that is now modularized.

So this is kind of, again, a diagram for how this flows. I have to say, I don't know what the analogy is for code. I haven't figured out how to draw this for software engineering yet. It's kind of my first guess at it, but like the latent capital, I don't know if it's code or spec.

[00:48:08] I think spec might be all the ideas you thought about creating, but you never got around to doing. And that's going to pull a bunch of other things with it.

So this is still an idea. I'm trying to figure out how it maps. It's not clear that aggregation theory applies, but I think the conservation of attractive profits applies as a canon.

00:48:29 - Anthony Campolo

You say more what spec means in this context?

00:48:32 - Ishan Anand

Spec would be a combination of prompts and evals and the stuff that's captured by the data flywheel. So traditionally we didn't really think of it as much of an asset. It was kind of amorphously inside the head of the product manager, but it was sometimes written down in that PRD.

But that's what spec is, the specification for how the code should behave or what the product should be, whereas the code is the instantiation of the product made real as software.

00:51:10 - Anthony Campolo

I want you to respond to these comments from fuzzy. MCP is just Anthropic's way of building a moat. Total lock-in, riddled with issues and problems. We don't need three different black boxes and a Mary Poppins API.

00:51:26 - Ishan Anand

I don't know about the Mary Poppins API. A couple of things I've actually been working on: a post in my head called "The Skeptic's Guide to MCP." I have been neutral on MCP. There's a tweet I put out which is like, if you're truly AGI-pilled, then why can't the AI write the API integration? Why do you need MCP? I'll say a couple things, though. There's something there. You don't theoretically even need an API. You could go write a web crawler to grab stuff from a website and interact with it. In fact, we used to have a service at one company I worked at that was called a synthetic API for businesses that needed to build an iPhone app or an Android app and they had no APIs. We would create what we called a synthetic API after the existing functionality. But clearly doing it as an API is cleaner and better.

[00:52:25] I think the same is true with MCP. I do agree that there are some issues. My biggest concern with the original implementation was that it was not compatible with serverless. They have fixed that, and they're continuing to iterate on that. I think that's part of the reason you're seeing the bugginess. You can't really fault them in a sense because you don't know how big your thing is going to get. It's better to put something out there and then fix the bugs.

Unfortunately, as much as we'd like in theory to have the uber-perfect spec, they didn't know how fast or how widely it was going to take off. I think it was November or October of the year before it blew up. It took a while, and you only really know how good or bad something is by people using it in the field. Then you find out what the real problems are, and you figure out how to prioritize those problems.

[00:53:22] So I would say, if you're running into issues, be the squeaky wheel. From my observation, as much as I've been a skeptic, they have been responsive, and they try to prioritize appropriately. But yeah, in theory, if you're asking why do I need it, it's not that you need it, it's that you'd probably want it.

And then the third thing I'll say is it's useful in a sense because it's their classic kind of network effects. If it turns into a channel for discovery and demand, much like social is today, then you could go without it, but you're turning away what might be a useful integration point or adoption mechanism for your customers just because by virtue of everyone using it. There are some emerging other protocols, like A2A, but I haven't seen them achieve that kind of tipping point. That is my response to that question.

00:54:12 - Anthony Campolo

Yeah. Fuzzy is our resident skeptic, so it's always.

00:54:16 - Ishan Anand

It's fine to be. I mean, look, Swix actually brought this up in one of his podcasts, which is that GraphQL was the new hotness, and everybody used it where it didn't even necessarily apply, like, oh, we're going to build it with GraphQL just because I want to use GraphQL, even though you don't really have graph data you're grabbing for. Right? There's no point maybe separating the concerns between the front end and the back end because this app is so small.

So there's definitely some danger in that. But I don't know. That's going to happen to anything that gets popular, I think. So anyway, that's my answer.

00:54:55 - Anthony Campolo

Cool. We only have 20 minutes left, so I want to just make sure we're keeping on track. We're going to hit all the things we wanted to hit, but real quick, Nick is saying my old coworker Lori, who works at the Llama Index, I think, now talked about all the protocols at MCP Dev Summit, MCP versus ACP versus A-2A, comparing agent protocols with Lori Bostrom. Okay. Yeah. Cool. So I'll put that in the show notes.

And then fuzzy says more on the baseline systems and the use of resources and the black box behavior that has proprietary nature of access and permissions.

00:55:33 - Ishan Anand

Wait, I didn't get the last one. Say that one again.

00:55:35 - Anthony Campolo

I think he might be. I'm not sure if this is done because it ends with a comma, but it says it's more on the baseline systems and the use of resources. The black box behavior is various proprietary. The nature of access and permissions. Yeah. I'm not sure. I think he's just going off of what he was saying before, that it's kind of lock-in and black box kind of thing.

00:56:00 - Ishan Anand

I don't know if that's... I mean, LLMs in some sense are inherently a black box. I've rebuilt the whole thing in JavaScript, and I still feel like there are parts of it that are black boxes. Even the experts don't know how it fully operates. There is a point, if you feel like there's lock-in, in the sense that almost every other major LLM provider has announced they will support MCP. But there is a sense in which it does serve the LLM providers' interest. There's no doubt there.

Then they introduced this thing called sampling. It's basically saying for every task you've got this one Uber LLM, that model that is going to answer all the questions. Even if you've got a chain of agents, the server can ping back and say, oh, Uber LLM that's managing everything, can you run this LLM query for me? So there's a sense where you might architect an MCP structure in a series of agents and tools and clients and servers where they might be distributed.

[00:57:12] And you could still do that, but it does have a capability that tries to centralize to one uber model, like our model. Sampling isn't well supported even right now, but I also think for the customer that makes the most sense. If I pay for Claude or I pay for ChatGPT, I just want to use the thing I've already got. If I can shuffle that cost onto the user as an MCP server creator, I think that's easier. I'd rather let them pick the model that they pay for and they trust, or that their enterprise or the company they work for trusts.

So there's a centralizing element, but I actually think it's in the user's interest as well and doesn't have to be that way. I don't think there's anything restricting it.

00:57:58 - Anthony Campolo

Cool. All right. Yeah. Let's check this out now.

00:58:01 - Ishan Anand

So yeah, the last thing I'll just say is a huge thanks to Ben and Swix, and the crew as well. As always, I think it was an amazing conference. I was privileged enough to attend last year and attend this year, and I'm looking forward to it being bigger and better next year. So that was the conference.

00:58:25 - Anthony Campolo

Cool. So what else did you want to talk about in terms of your spreadsheets? All you need.

00:58:30 - Ishan Anand

Yeah. So I've continued to make improvements to this GPT-2 implementation in the browser. I think I talked about it somewhat, you know.

00:58:42 - Anthony Campolo

Yes.

00:58:43 - Ishan Anand

We did last session. You know, in preparation for my talk at the summit, by the way... yeah, there it is. It was "How LLMs Work for Web Developers." Maybe I should retitle this to "Build an LLM from Scratch in Vanilla JavaScript in Your Browser."

The key thing is I don't think folks should feel intimidated that they need to go and understand a whole semester of linear algebra, a whole semester of calculus, before they can even begin to understand how Transformers work. In fact, I feel like in some syllabus that I've looked at, Transformers were at the very end and it's like.

00:59:21 - Anthony Campolo

You.

00:59:21 - Ishan Anand

Can jump right to how they work if that's really what you care about.

00:59:24 - Anthony Campolo

Now, by the way, Nick was saying that he was at the World's Fair and didn't know you were there. So next time you guys should meet up because Nick works for Pomerium and is doing a bunch of stuff around MCP and security and zero trust protocols and things like that.

00:59:40 - Ishan Anand

He's like the second or third person I found out was there and I was like, oh, we should have met up. I mean, it's not only 3,000 people. It's like 3,000 really influential and important people. I go there to meet people that I don't see otherwise, so it's just a really great gathering of folks. Hence why the hallway track is so important.

In preparation for this, I basically just made some kind of UX improvements to this, and I've continued to improve it into more like a Python notebook. I don't have it in this version, but you can create new cells, you can reorder them, and then I'm trying to turn that into kind of a [unclear]-like spreadsheet. I don't have the demo of that ready today, but that's basically what I'm working on.

[01:00:32] But I've added some stuff that made going through this live during the conference talk a lot easier. What I'll be doing is I'll record a shorter version of this and it will auto-play a video while I walk through a five-minute overview of each key part of the model to get people started so they can understand it.

So that's the other thing I've been working on, in addition to kind of my other projects and consulting and training.

01:01:00 - Anthony Campolo

So when you do your spreadsheets, all you need courses these days, are you using the JavaScript one or the Excel one?

01:01:08 - Ishan Anand

Yeah, it's a great question. So the course is up here. With the last cohort, I mostly used the JavaScript one. I would say the last one I did, which is available and the recordings are online, I did a mix of spreadsheets and JavaScript, depending on what people during the live sessions wanted. Most people gave me the feedback that they like the JavaScript one. It's easier. You don't have to wait for things to boot up.

I would say that unless you are afraid of JavaScript... so the original class, to be clear, was targeted more at people like the technical product manager or the engineering manager, you know, the technical executive who knows code but hasn't coded in a while. And so I was trying to broaden the audience as much as possible, so I used Excel. But I've gone through, it must be, four or five cohorts now.

[01:02:03] And, you know, clearly the Venn diagram of people who know Excel and want to learn how a model works is very narrow. I did have a CFO, you know, a chief financial officer. He knows Excel very well, had no background in AI, and he's like, this was amazing, but there are not that many people like him.

So the people who know JavaScript and want to understand how a model works without having to go learn Python or PyTorch, and if you're just going to be building an AI product and not customizing, you know, actually changing model architecture, you don't have to learn Python if the rest of the app is going to be built in Next in TypeScript. But you want to have a good mental model for what's really happening. You don't want to be feeling like you're on quicksand intellectually. This is what this gives you. It gives you a sense of how everything works.

[01:02:48] And so now I've moved to teaching it more in JavaScript. There's a few things that are easier in a spreadsheet, ironically, to show, like this one, actually.

01:03:00 - Anthony Campolo

It's a very visual medium.

01:03:02 - Ishan Anand

It's very visual, like if you want to see [unclear] and you want to see how the tokenization process happens. So, like, here's "quickly." This code is not how you would write a real tokenizer right here, by the way. And you can expand this up. Each one of these, this is probably the longest one, it's like 100... each one of these modules or cells is basically like 20 to 30 lines of code.

But this is not how you'd want to write a tokenizer, but it's written in a way that illustrates the point, like how it builds a token from a word. So here's space q u i c k for the word quick. And then it's looking at each pair and saying, okay, which pair is the most popular in our vocabulary? Oh, it's this one right here. That's the highest number. So then I'm going to just rewrite this word as space q, u, and then ic together as its own character, its own token, and then k, and then it's going to just repeat the process.

[01:03:54] So it's going to say, oh, well, what is the most popular? Put it into these pairs separated by commas. And it'll take the most popular one. In this case it's q and it's got the highest number. So I'll put that together. And this is my new rewriting of it. Eventually it builds it up. In this case it knows it's a whole token. But if I did something like "I don't want to quit," that would have been bad.

Reinjury is a classic one I use, and the other one I use is reindeer. So if I run that, if you look at that, it never turns into a single token. And what's interesting is both of these words start with rein, right, but they're being parsed into tokens very differently. I'll go here, and you'll see "re-in" becomes one token and "jury" becomes another. But in the other case, "re" is split as a token. And it's all about which adjacent characters are next to each other.

[01:04:47] That's easier to see inside a spreadsheet than when you're just looking at the code. So there's some things I still do in a spreadsheet, which is ironic because when I first wrote it, this was the hardest thing to do inside a spreadsheet, but the rest of it is just math.

01:05:02 - Anthony Campolo

Cool. And it's saying here on your Maven page as a 25 week cohort.

01:05:08 - Ishan Anand

No, no, no. Yeah.

01:05:09 - Anthony Campolo

Is that not correct?

01:05:10 - Ishan Anand

That is not correct. Okay. So what I have done is I have left the cohort open for anyone to come and join and attend asynchronously.

01:05:21 - Anthony Campolo

Okay. Because that's what I was going to ask. It says here that's ending June 27th.

01:05:25 - Ishan Anand

So I just extend it every month until I do a new live one. My next live one will probably be, you know, entirely focused on JavaScript.

But what I offer is you can sign up now. Basically, the normal course is two weeks, three days a week. So it's 1.5-hour lectures, and there are five of them put together. Then there are quizzes, and there aren't necessarily projects yet, but it's mainly quizzes to review the material.

And so you can take the whole thing kind of asynchronously if you watch the class.

01:06:02 - Anthony Campolo

Okay. I see under the course syllabus, it shows the two weeks of stuff you teach.

01:06:07 - Ishan Anand

Yeah, exactly. And so I walk through the whole model in depth. And then what I offer is I'll answer any questions, you know, asynchronously over Discord or over email that people have.

And then when I hold my next live cohort, you're eligible, if you subscribe to the asynchronous one, to attend the live one for free so you don't feel like you've missed out. And then if you sign up for the semiannual plan of my Patreon, you get the course at a discount.

01:06:33 - Anthony Campolo

So what is the next one you're doing if someone were to sign up now?

01:06:38 - Ishan Anand

The next live one has not been scheduled yet.

01:06:41 - Anthony Campolo

Gotcha.

01:06:42 - Ishan Anand

I haven't decided if it's going to happen this summer or this fall.

01:06:45 - Anthony Campolo

Cool. Awesome.

01:06:48 - Ishan Anand

So that's what I've been working on.

01:06:52 - Anthony Campolo

And of course, that's cool.

01:06:53 - Ishan Anand

If people need somebody to come in for speaking and training or consulting, then I'm also available.

01:07:00 - Anthony Campolo

Yeah. You mentioned you're consulting for, like, an AI startup or something. What is it? Have you done consulting before, or is this the first time?

01:07:12 - Ishan Anand

I had done consulting a long time ago, when the iPhone first came out. In fact, there was one example that might sound strange at the time. They were building a streaming video service, and it was called Modern Feed. It eventually became Clicker, and the CEO recognized as soon as the iPhone came out, like, this is going to be how people are going to watch, even though the bandwidth isn't quite there. Now is an opportunity.

So that was one of the clients I worked on. I also helped do technical editing for a book on iPhone development. I've also considered actually writing a book version of my class, which is something else I'm working on.

01:07:57 - Anthony Campolo

Yeah. I mean, at this point, you probably have so much material already. Oh, yeah. You could just kind of throw all that to ChatGPT and say, "Make a book, bro."

01:08:07 - Ishan Anand

I actually started experimenting with that. It's not as good as I would have hoped.

01:08:12 - Anthony Campolo

It sounds like it's written by an LLM, you know. Well, yeah, that's always the issue.

01:08:16 - Ishan Anand

But what it does is it solves the blank page problem.

01:08:20 - Anthony Campolo

Exactly. Yeah.

01:08:21 - Ishan Anand

It's so much easier when you're looking at it and you're like, oh, you're so wrong here. Oh, this is not hard, right? Then I actually feel it's not just that it gave you a head start. It's like I am motivated to correct it. It's like that "someone is wrong on the internet" feeling.

So I tried doing this for a chapter. I'm seriously looking at doing that because it'll give another way for people to kind of get up to speed without having to go through my whole class.

01:08:48 - Anthony Campolo

Yeah, that's a super good idea. Anything else that we haven't hit on that you want to talk about?

01:08:55 - Ishan Anand

No, I think we hit on everything. You know, thank you for having me back. Yeah. And, you know, check out Spreadsheets All You Need if you want to understand how a model works on the inside from JavaScript.

01:09:08 - Anthony Campolo

That's it. Let's ask a couple questions since we have a couple minutes still. What's your go-to model these days? There's been a lot of new stuff, both with OpenAI and Claude. O3 Pro is now out. I don't know how much you've been using that. You know, Claude 4 Opus? Gemini 2.5. Where are you at with your models?

01:09:33 - Ishan Anand

I use all three. If I had to say which one I'm using day to day, I'd say o4 mini high is the one I'm probably using day to day. It has the right balance of thinking and ability.

You know, Claude recently got voice mode, but for a while it didn't. So I used to use Claude Sonnet a lot. But what I found useful with OpenAI is they, you know, people hate the model picker, but I actually like it because I'll ask a certain question, and then I might ask a follow-up question, and I want a different model, but I want it to have all that other context.

Like one of the reasons I'm building, you know, this thing into like a Python notebook is I actually want to be able to assemble context. I can even, like, this is the version of it that lets you, you know, I can create a new cell above here, I can reorder them, but I want to be able to craft context where I grab some things from one model, some from another place.

[01:10:34] It's this whole packaging of context, right? And with the model picker, sometimes I want a reasoning model, and I'm going to use a lightweight reasoning model to grab a bunch of things and put them in context. And then I want to reason over all of that. I want a reasoning model to look at all of that.

So I'm a big fan of using multiple models together. In fact, I even think I tweeted about this last year. One thing I often do is I'll ask multiple models for an answer, but I'll sometimes take the output of one model and send it to the other and vice versa.

01:11:12 - Anthony Campolo

Interesting. I mentioned the exact same thing on the last stream I did with The Goose people. We were talking about this, and there I was saying how I do that, and they're like, oh, I've never thought about feeding different models inputs and outputs to each other.

01:11:24 - Ishan Anand

Oh yeah. This was at least last year. So this is with o4 and Sonnet, and I observed that Sonnet, we should find the post, was usually much more agreeable and saying, oh, that's a really good point, surprisingly. And then OpenAI never, like, acknowledged when it got an interesting insight from the other model. It didn't know it was.

But I was like, how about this version of it? And it'd be like, oh, this is good. But I felt like Sonnet was a lot more open minded. So I tend to go back and forth, and some models had certain capabilities for the longest time that others did not.

So now at the point it's usually a differentiation that takes me to a particular model. So, for example, the long context in 2.5 Pro, and I've been using 2.5 Pro a lot more than I've been using previous versions.

[01:12:17] Gemini, that is really good. You know, people ask me which model they should pick, and I'm kind of like, for an everyday user, it's hard to go wrong. And then they each have various parts they're good at, I think.

You know, Sonnet still is pretty good at coding. And, you know, Gemini Pro, especially for long context, dumping a YouTube video and putting it in NotebookLM is very, very useful. Like I took the Foundation Seattle thing, for example, and I dumped the video in there, and I said, you know, who are the sponsors that were mentioned the most? And it picked them out. And I just got that in a few minutes. That was a huge win.

So I found that really easy to do. So that's my long answer. These days I happen to be using o4 mini high because it's got the right balance of thinking and speed for me.

01:13:05 - Anthony Campolo

And then are you coding with Cursor vs Code Cursor?

01:13:09 - Ishan Anand

I've got Cursor right here. This is actually the session sorter app, which is on GitHub. And I wrote it with Cursor. And one of the things I started doing is this.

01:13:20 - Anthony Campolo

Yeah, it's in the show notes. I grabbed the link for it.

01:13:23 - Ishan Anand

So the thing I've been meaning to try that I have not gotten a chance to try is Claude Code and OpenAI Codex, which are much more agentic approaches to coding amazing things. But I just haven't personally had time to sit down and do it.

01:13:42 - Anthony Campolo

Awesome. Now you also gotta check out Open Code Access Project, which they're currently in a battle to the death over the name with somebody.

01:13:50 - Ishan Anand

Oh really? I will check that out as well.

01:13:53 - Anthony Campolo

Yeah, yeah. Cool, man. All right. We'll probably have to call it here. Next time we do one of these, I want to get you for, like, a full two-hour block, because I feel like we never have enough time, but this is super great. Enjoyed everything, all the insights you have. Always appreciate this. I'll be going back and watching this over again. So, yeah.

01:14:12 - Ishan Anand

But thank you again for having me. Always great to talk to you, Anthony.

01:14:16 - Anthony Campolo

All right. Thanks to everyone who was in the chat and watching. And we'll catch you next time.

On this pageJump to section