
JavaScript LLMs with Ishan Anand
A conversation with Ishan about building large language models in JavaScript, local AI experimentation, and the future of accessible machine learning
Episode Description
Ishan Anand discusses DeepSeek's viral moment, its misunderstood training costs, and his JavaScript reimplementation of GPT-2 using web components.
Episode Summary
In this conversation, Anthony Campolo welcomes back Ishan Anand, AI consultant and creator of Spreadsheets Are All You Need, to discuss two main topics: the cultural phenomenon surrounding DeepSeek's R1 release and Ishan's new JavaScript-based reimplementation of GPT-2. The DeepSeek discussion explores why the model broke through to mainstream awareness unlike other AI competitors, with Ishan sharing Google Trends data showing its unprecedented spike compared to Claude, Gemini, and even ChatGPT. They unpack the widely misunderstood $5.5 million training cost figure, explaining how the media compared only the final training run cost against competitors' total development budgets — an apples-to-oranges comparison that Ishan likens to pricing a house based solely on its lumber. The conversation then shifts to Ishan's JavaScript implementation, built entirely with web components so it can run anywhere without build steps or dependencies. He demonstrates a notebook-like environment where code cells and spreadsheet-style tables work together to walk through every step of GPT-2's inference process. Looking ahead, Ishan outlines plans to add training capabilities via an autograd system, build a prompt engineering workbench that calls external APIs, and develop a browser extension for agentic workflows — all aimed at making AI more accessible and less mystifying to everyday developers.
Chapters
00:00:00 - Introductions and Ishan's Background
Anthony welcomes Ishan Anand back to the show, and Ishan introduces himself as an AI consultant, educator, and former CTO of Layer0. He recaps his well-known project, Spreadsheets Are All You Need, which implements GPT-2 entirely in Excel spreadsheets to teach people how transformers work internally.
Ishan also reveals that he has recently reimplemented the project in JavaScript using web components, choosing that approach for maximum accessibility. He explains his reasoning: web components avoid framework lock-in, require no build steps, and can run in nearly any environment, including learning management systems and air-gapped machines where students may face installation restrictions.
00:03:45 - The DeepSeek Phenomenon and Why It Went Viral
The conversation shifts to DeepSeek's sudden mainstream visibility. While the AI community had been tracking DeepSeek's releases throughout the prior year, Ishan notes the tipping point came when his elementary-school-aged son asked him about it — something that had only previously happened with ChatGPT. Anthony theorizes the model became a cultural Rorschach test, letting people project their own views about AI competition, geopolitics, and tech industry spending.
Ishan shares Google Trends data illustrating how DeepSeek was the first model besides ChatGPT to achieve a significant spike in public search interest, surpassing Claude and Gemini. They discuss second-order effects, including a massive surge in Ollama downloads and growing public interest in running models locally, suggesting DeepSeek may have shifted awareness of local AI as a viable distribution channel.
00:09:50 - Pop Culture, Public Perception, and AI Awareness
The hosts explore how AI is perceived by the general public, noting that for most people, AI simply means ChatGPT. They reference a South Park episode about AI-assisted texting and a comedy sketch about an AI boyfriend, illustrating how mainstream culture has absorbed the concept. Ishan points out that even OpenAI's ChatGPT still has room to grow in awareness, citing a holiday feature where users could talk to a Santa voice as an example of what drives public engagement.
This leads into a broader observation about the technology bubble versus public understanding. Ishan references his AI 2025 trends writing, where he identified a growing AI backlash among people who feel the technology is something happening to them rather than something they can participate in. This sentiment, he argues, partly fueled the DeepSeek hype as people used it to challenge perceived tech industry narratives.
00:15:35 - Debunking the DeepSeek Training Cost Myth
Anthony and Ishan dissect the widely circulated claim that DeepSeek R1 cost only $5.5 million to build. Ishan explains the figure refers solely to the final training run's compute cost, excluding research, experimentation, ablation studies, staffing, and infrastructure. He compares this to pricing a house based only on its lumber while comparing it against a competitor's fully built home — a fundamentally misleading comparison.
They reference Dario Amodei's post analyzing the situation, which argued that DeepSeek did achieve genuine efficiency gains but not at the dramatic ratios the media suggested. The cost reduction largely reflects a natural Moore's law-like decline in training costs over the 6–12 months since competing models were built. Ishan emphasizes the paper itself wasn't deceptive, but the media narrative that followed misrepresented the figures by omitting critical context about total development costs.
00:23:02 - R1 Zero and Open Source Reasoning Models
Before transitioning topics, Ishan highlights what he finds most intellectually fascinating about DeepSeek: the R1 Zero model. This variant achieved reasoning capabilities through reinforcement learning alone, skipping the traditional RLHF pipeline that most reasoning models require. The implication is significant — building reasoning models has become dramatically more accessible for open source developers.
Ishan mentions his YouTube video walking through the post-training pipeline and references Will Brown's simple example of building a reasoning model that anyone can run on RunPod. He notes you can observe the "aha moment" where the model begins to reason, making this both exciting from a research perspective and an important demonstration of how quickly these capabilities are democratizing.
00:25:03 - The JavaScript GPT-2 Implementation Deep Dive
Ishan demonstrates his JavaScript reimplementation of GPT-2, which functions as a hybrid between a Python notebook and a spreadsheet running in the browser. The system uses two types of web components — code cells and sheet/table cells — that reference each other through a custom notation system. Users download the 150 CSV files containing GPT-2's model parameters, which are stored in IndexedDB for persistence.
He walks through the code, showing fundamental operations like matrix multiply written in vanilla JavaScript, and demonstrates running inference on the prompt "Mike is quick. He moves," which produces the predicted token "quickly." The implementation makes every calculation transparent and deterministic, emphasizing that unlike typical software full of conditional logic, a transformer's computation follows a fixed, predictable path with minimal branching.
00:33:34 - Making Machine Learning Accessible Through Education
Ishan discusses his Maven course, where he teaches all five lessons of transformer internals in roughly seven and a half hours total, requiring only a basic recollection of calculus. He challenges the conventional wisdom that students need months of prerequisite math and classical ML techniques before studying large language models, arguing most learners are motivated by ChatGPT and want direct understanding of how these systems work.
The course has run four or five cohorts, and Ishan now offers asynchronous access with recordings, quizzes, and a Discord community, plus the option to attend a future live session for free. He demonstrates a spreadsheet-based training example showing backpropagation epoch by epoch on a simple neural network, which both illustrates the concept and reveals the limitations of spreadsheets for more complex training workflows.
00:39:15 - Future Plans: Training, Prompt Engineering, and Browser Agents
Looking ahead, Ishan outlines three ambitious directions for his work. First, he plans to add model training capabilities using an autograd system, potentially leveraging an existing JavaScript framework called JS PyTorch, which would let users fine-tune GPT-2 weights directly in the browser. Second, he envisions the notebook environment evolving into a prompt engineering workbench where cells call external APIs instead of computing logits locally.
Third, he describes a browser extension prototype that opens tabs, navigates pages, and clicks elements on behalf of the user while keeping a human in the loop. Because it runs locally using the user's own credentials, it addresses privacy and authentication concerns that plague cloud-based agents. Ishan also notes that making the environment web-native enables embedding videos, chatbot assistants, and other rich media alongside the computational cells, creating a more complete learning and experimentation platform.
00:45:29 - Wrap-Up and Where to Find Ishan
Anthony and Ishan wrap up the conversation, with Ishan summarizing his most exciting upcoming projects: the prompt engineering workbench and in-browser model training. He mentions an in-progress video about counting FLOPs and analyzing DeepSeek's actual training costs, encouraging viewers to check his YouTube channel, Spreadsheets Are All You Need.
Anthony reflects on how rapidly the AI landscape is moving, mentioning plans for an AI podcast summarization newsletter to keep up. Ishan echoes the difficulty of staying current, citing recent releases like OpenAI's Operator and deep research tools. He shares his contact information — Twitter at @IAnand, BlueSky, LinkedIn, and his websites SpreadsheetsAreAllYouNeed.ai and Ishan.com — before they close out the episode.
Transcript
00:00:03 - Anthony Campolo
All righty. Welcome back, everybody. AJC and the web devs with returning guest Ishan, and my old boss and good buddy.
00:00:14 - Ishan Anand
Am I your first returning guest or not, out of curiosity?
00:00:19 - Anthony Campolo
Nicky T was a returning guest. Monarch was doing a weekly thing with me for a while.
00:00:26 - Ishan Anand
Oh, awesome. You're okay.
00:00:27 - Anthony Campolo
You're one of the first, though.
00:00:28 - Ishan Anand
One of the first. Okay, well, if not the first, at least that's the second best. Thank you for having me. For folks who don't know me, I'm Ishan Anand. I was the CTO and co-founder at Layer0, which became part of Edgio in 2021. More recently, I've been doing a lot with AI. I'm probably best known as an AI consultant and educator. I built an implementation of GPT-2 entirely in spreadsheets. It's called Spreadsheets Are All You Need. You can go to the website, SpreadsheetsAreAllYouNeed.ai. I also teach a course based on it, where I take folks through every step and every formula of that spreadsheet so they understand how a transformer works on the inside. Most recently, I reimplemented it entirely in JavaScript for web developers. So that's me in a nutshell.
00:01:13 - Anthony Campolo
Yeah, I'm excited for the JS thing. We actually talked about this last time you were on. I was like, you know, you could just do this in React. And you were like, I've thought about that. But yeah, is it vanilla JS? Is it web components? What is it written in?
00:01:27 - Ishan Anand
Yeah. So it's entirely in web components. The reason I did that is I wanted it to be as accessible as possible for the largest audience. If I made it React, people have to learn React and the basic interactive primitives are components.
00:01:43 - Anthony Campolo
They are components.
00:01:44 - Ishan Anand
Web components are fairly simple in terms of their life cycle. I wanted it to be structured in such a way that it was actually easy and flexible to run and develop and drop in. For example, you could stick this inside any CMS using these same components. You could put parts of the transformer model, I'm actually embedding it inside WordPress. I didn't want to have all the headaches of, if I come back to it a couple of weeks later, a couple months later, even years later, having to worry, like, do I have to redo a whole build step, and stuff like that, and things going out of date.
I wanted something that would be long-term stable. I wanted something that would be easy to use in almost any environment. I got to teach a machine learning class, or two weeks of a machine learning class, where a professor invited me to come speak as a guest speaker for two weeks and teach them on an LMS. That's actually interesting because, normally, LLMs are at the very end of the class, the advanced concept. One of my things is saying you don't need to do that. It's actually a lot more approachable than folks realize.
But inside a learning environment, it was very interesting to see all the restrictions. You had people on different types of machines, limits on what they can install. I wanted this to be as easy as possible to run locally. So you could be in an air-gapped environment. You just bring the HTML file and it should run. In fact, you don't even need a server. I wrote it so that I can double-click on it and then it just runs. That's been a breath of fresh air.
00:03:17 - Anthony Campolo
Do you have, or have you thought about doing, like, a StackBlitz or something like that for it?
00:03:21 - Ishan Anand
I have thought about that. As you remember, when we were at Layer0, we were following what they were doing and thought extremely highly of what they're doing, and it's very cool. I have thought about using that. I've thought about using Wasm in various other parts, but I don't know where you want to start. We might be going way down a rabbit hole. Maybe we should start at a higher level. But you tell me, where do you want to go first?
00:03:45 - Anthony Campolo
Let's talk some topical stuff first. Just kind of ease us in, and then we can get deep into the tech. I know you have lots of thoughts about DeepSeek, and we were just talking before we started how, for some reason, the normies know about DeepSeek. Why do you think that is?
00:04:04 - Ishan Anand
You know, the first sign for me was, well, for those who aren't aware, the AI community has been aware of DeepSeek for a long time. All of last year, we were talking about various DeepSeek releases. One of the podcasts I routinely listen to and sometimes occasionally drop in is Thursday AI, and they had talked about DeepSeek multiple times, whether it was for V3. I remember when R1 preview was announced last year and I was like, wow, they already caught up to o1. But it wasn't the full paper. It wasn't the full open source release.
So they didn't come out of nowhere. But to me, the tide had turned when my son, who was in elementary school, said to me, and this was the Sunday before the market reaction and DeepSeek had already been out for a week, and he's like, Dad, do you know what DeepSeek is?
[00:04:57] I was like, oh my gosh, my elementary schooler. He's never done that for anything other than ChatGPT. Claude is an amazing model as well and not enough people know about it, and I have to tell them about it.
00:05:10 - Anthony Campolo
They don't know about Gemini. They don't know about anything. It's always just been ChatGPT. But now for some reason, this is like the second one everyone heard about. My doctor asked me about DeepSeek. Like, what?
00:05:24 - Ishan Anand
I mean, among Chinese models, there's also been Qwen. Don't get me wrong, they've been doing amazing work, but for whatever reason, they seem to have broken through.
One of the interesting things I started digging through is the implications of that. Let me pull up some data that I tweeted out. Let me find that.
00:05:46 - Anthony Campolo
So real quick, I'll tell you my theory. I think that one of the reasons why it blew up is because it's a Rorschach test for anyone's opinion about either AI or geopolitical China stuff, because people could look at it as a thing where, oh, China is now ahead in AI, or oh, China just stole this, or oh, China lied about how much it cost, or China didn't lie about how much it costs. So you have all these different perspectives that anyone can bring to it. And then also it's like, you know, same thing about OpenAI because it has that OpenAI where they're going to be like taken over or not. So I feel like everyone kind of came to it with their own perspective, and so everyone was talking about it because everyone wanted to push their own opinion about what it meant.
00:06:32 - Ishan Anand
I was going to use that same phrase. It's a Rorschach test for whatever your preconceived priors are. There are investors who might have felt certain companies were overvalued or skittish about the CapEx spend. There are people talking about the geopolitical concerns. There are AI researchers who are genuinely excited about some of the techniques we can get into that were in this open source release that are really fascinating.
There are plenty of people who, like, I wrote about this in my AI 2025 trends. One of the trends I wrote about is this growing AI backlash. I wrote that I feel like there's a segment of the public that feels like AI is a thing happening to them, and not something they can be part of. That's actually one of the motivations for why Spreadsheets Are All You Need exists, to help people understand. You don't need a whole PhD to understand how this thing works or to be part of the revolution.
[00:07:35] There's a guy who did an interesting survey. He went through a bunch of interviews of why people were talking about DeepSeek, and a lot of them were doing it to stick it to the man, whoever they thought the man was. It very much became ammunition in a variety of different arguments.
For whatever reason, they also timed it the same time as the inauguration. I don't know if that was a coincidence. Maybe they were rushing to get this done before Chinese New Year. It hit a certain [unclear] that just kind of snowballed on itself. Like you said, let me pull this up, it really was one of the first that catapulted. This was some data I started posting.
[00:08:28] Here's Google Trends for various AI models. So here's DeepSeek, ChatGPT, Claude, Gemini. This red one is ChatGPT. This spike right here, by the way, is, I'm pretty sure, the Santa voice release during the 12 days of Christmas. I actually noted it here. Yeah, it's very instructive.
00:08:50 - Anthony Campolo
I don't know what that is.
00:08:52 - Ishan Anand
You could call Santa and talk to him. So imagine you told your kid, hey, you want to talk to Santa? They can now talk to Santa and be like, I want this for Christmas. And it would be like, ho ho ho. I was like, Santa, what does Rudolph like to eat? And he'd be like, hey, or something like that.
00:09:12 - Anthony Campolo
Somehow I missed that.
00:09:13 - Ishan Anand
Yeah. It's an instructive lesson, I think, for those of us in the technology bubble or even the AI bubble, on how to make AI even relevant to the outside of the bubble, to most people in the general public. Even as well known as OpenAI's ChatGPT, I think for most people, ChatGPT and OpenAI is AI. When they think about AI, they're not thinking about the recommendation algorithms inside their social network. Their AI is like a chat interface, and it's hosted by OpenAI, and ChatGPT is what it's called, right?
00:09:50 - Anthony Campolo
Yeah. It's like there's a whole South Park episode about it, you know?
00:09:53 - Ishan Anand
Oh, is there really? Okay. Then that's hilarious.
00:09:59 - Anthony Campolo
It's about how Stan or Wendy is mad at Stan because he always just thumbs up her texts. And then Clyde is sending these long, amazing messages to his girlfriend. And Stan's like, Clyde, how are you doing it? And he's like, ChatGPT, bro.
00:10:17 - Ishan Anand
Oh, boy. That reminds me of the Kazam AI boyfriend video. Have you seen that?
00:10:24 - Anthony Campolo
No, I haven't.
00:10:25 - Ishan Anand
Oh.
00:10:26 - Anthony Campolo
That one. I need to check that out.
00:10:28 - Ishan Anand
Okay, I'll briefly describe it. It's basically like some decades in the future, and this girl is bringing home her AI boyfriend, and they're gonna get married. It's kind of like a sketch.
00:10:40 - Anthony Campolo
Oh, no, I didn't see this. Yeah. This was back in 2023. Yeah, yeah.
00:10:43 - Ishan Anand
This was a while ago, yeah.
00:10:46 - Anthony Campolo
This was racist towards the AI boyfriend,
00:10:49 - Ishan Anand
Exactly.
00:10:49 - Anthony Campolo
is the joke?
00:10:50 - Ishan Anand
Yeah. So, anyway, back to this. This shows, as a first lesson, that even as well known as OpenAI's ChatGPT is, there's room to grow in awareness. Look at this. This is pretty huge. They really hit the target on this one. But the more important thing that I'm trying to get to is if you look at Claude and Gemini, which are both really good models, Gemini especially trying to do a lot with video. Claude has an interesting personality. It's one of the few models I hear people talk about the way they talk about Apple products, just in terms of its personality. But DeepSeek is the first one that, just look at this, is the only one that managed to jump up even though there are plenty of worthwhile contenders.
[00:11:40] I don't know if this is going to come back up. I will tell you, when I searched as a topic, it looks different than when I search as a search term, which is not unusual. But when I search as a topic, the first few days it didn't really pick up anything. So I think they were caught off guard. We'll see how this turns out a little bit later. The second-order side effect of this is that a lot of people downloaded the DeepSeek app. Another example of this is you see it at the top of the charts, but I think it also has second-order effects.
I think more people now are aware of running models locally as a distribution channel for AI models, and that's going to be more popular, potentially in the future than it was before. DeepSeek may have helped give that a further boost. So here's an example of that. This is Ollama, which is, for those who don't know, a tool for running models locally.
[00:12:33] You can see right here, this is the 90 days preceding R1. And then all of a sudden it takes this huge, massive spike. Ollama search interest surges. If you go back to this one, you can see.
00:12:45 - Anthony Campolo
Look at the most popular models on Ollama. So before it was Llama 3.3, which says it has 1.2 million pulls. Now it's DeepSeek R1 with 13 million.
00:13:01 - Ishan Anand
I think that's right.
00:13:02 - Anthony Campolo
Actually not entirely right. This is not perfectly ranked. Actually, I'm scrolling down. It has Llama 3.1 at 22 million because that has been around longer.
00:13:12 - Ishan Anand
But this is over the same time period or different time periods here?
00:13:17 - Anthony Campolo
Just go to ollama.com/search.
00:13:20 - Ishan Anand
Ollama.com/search.
00:13:25 - Anthony Campolo
Okay. We got Scott in the chat.
00:13:30 - Ishan Anand
So here's popular, but it doesn't say over what time period. I think that's pulls total.
00:13:37 - Anthony Campolo
I think that's pulls, like total.
00:13:40 - Ishan Anand
Yeah, I think it's overall time, right? And Llama has been around a long time.
00:13:44 - Anthony Campolo
So the fact that it's only been out for, you know, a month or two, because I'm assuming that's different from R1 preview.
00:13:53 - Ishan Anand
R1 preview was before. Yeah.
00:13:56 - Anthony Campolo
Mhm.
00:13:57 - Ishan Anand
You know, somebody I think tweeted out that there aren't enough Nvidia GPUs to run all the downloads of R1 that have happened, which is also an indication of maybe how the market mispriced things. But here's a few other things like GGUF, Llama, SafeTensors. You can just see all of them bounce around. Then right around when R1 hits, they all take a spike.
So I have to imagine, I wrote, the customer journey is like: DeepSeek. They hear about it, they're like, okay, I download Ollama, then what the heck is GGUF, right, or SafeTensors?
00:14:37 - Anthony Campolo
What is GGUF?
00:14:39 - Ishan Anand
It's a format for downloading model weights that you would use in one of these tools. And SafeTensors as well, same thing. So I think it's possible that there's second-order effects from this. I don't know what to call it, like a DeepSeek moment, that are really interesting.
00:15:04 - Anthony Campolo
Yeah. I saw some people calling it a Sputnik moment, which I think is a fairly good comparison.
00:15:10 - Ishan Anand
A little bit. It's not bad. It certainly captures, I think, a certain segment, like you said, the Rorschach test of how people react. But the difference with Sputnik is that the US realized it was behind here. It was like them catching up. Maybe that's a subtle difference, but it definitely resonates with people.
00:15:35 - Anthony Campolo
People are making the argument that because it was so cheap, this means they're now ahead of us. I don't really buy that argument, though, because even if we do take the number at face value, this is how things go. Scientists come up with a new breakthrough, and it costs a ton of money to research it and create it, and then people can replicate it for cheaper. That's just how things go. It's not really a breakthrough in terms of going beyond o1. It kind of reaches a similar level to o1 for much cheaper. But people are acting like this is some huge deal, that they did something amazing. It's like, well, no, because they didn't make the breakthrough.
00:16:19 - Ishan Anand
Well, to be fair, let me give you my take on it. They made a couple of really interesting breakthroughs that made things better or more efficient. The way they trained R1 was interesting. Sorry, R1 Zero, and I did a video on that. We can talk more about why R1 Zero is interesting.
They did some interesting efficiencies like multi-token prediction and latent attention. These are all things you do to make it more efficient. I think the bigger thing is we have to put the number in context. This is an apples-to-oranges comparison.
Let's pull this up. The thing that I think got everyone spooked is this phrase here, and they said it cost us something like 5.5 million.
[00:17:13] Yeah, 5.5 million. I don't know why. Oh, because I zoomed in. I have to zoom in properly for it to handle it like this. Right here. This five point still doesn't want to highlight, but you see it: 5.5 million. Yeah, so that is.
00:17:29 - Anthony Campolo
Is that an actually accurate comparison to what it would have cost? Because I've heard other people say that that's leaving out things like actually buying the data center, which other models were including in their price. Then I hear people say that it is the same thing. So I've heard various claims about whether that's an actually accurate number and an accurate comparison. I don't know who to believe.
00:17:53 - Ishan Anand
It is. So if all you care about is how much it costs them to do that last final training run, the estimates, if you take the flop count, are roughly in the right ballpark. I'm actually working on a video on this. However, the cost of building a model is more than just that one last training run, and they even say it here.
00:18:20 - Anthony Campolo
That's why I think it's deceptive to say that it only costs that.
00:18:24 - Ishan Anand
Well, they may or may not have intended it. I think somewhere along the line somebody read this who didn't appreciate what it takes to train a model and saw 5.5 million. I literally saw an article, and I wrote to the author, where it was like they spent 5 million compared to the 5 billion that OpenAI spent on development.
00:18:46 - Anthony Campolo
And that was the headline. That's what they ran.
00:18:48 - Ishan Anand
My analogy is it's like taking a house in, say, Texas where the costs are cheaper and pricing a house based on just the lumber, then looking at a house in California and comparing the whole house to the cost. It's pricing a house just based on the timber. That's part of it.
00:19:12 - Anthony Campolo
You're definitely falling on the line of it's deceptive to frame it as this is the only thing they spent to create this model.
00:19:19 - Ishan Anand
It's deceptive to frame it, or it's misleading to frame it. It is not, I would say, accurate.
00:19:26 - Anthony Campolo
I don't think the paper did that.
00:19:28 - Anthony Campolo
To be really clear, I don't think the researchers necessarily tried to do that. I think the news cycle ran away with it, exactly.
00:19:36 - Ishan Anand
Yeah. I want to be really clear. I think the paper was not trying to be deceptive. I think it is deceptive or misleading to frame it that way.
00:19:45 - Anthony Campolo
Oh, that's a good point here, though. The actual cost per token is quite a bit cheaper.
00:19:52 - Ishan Anand
There is probably margin. I can't speak to that. It is definitely true that it's cheaper, in fact. But there are commercial models. I think Gemini, last I checked, actually has a version that's on par with DeepSeek pricing.
00:20:12 - Anthony Campolo
Also, if you want to, there are all these, like Groq and Together and Fireworks, which let you run Llama 3.3. Those are cheap as hell too. My whole issue is that the outputs are not really that great with Llama compared to the best models, whereas DeepSeek, the outputs are actually pretty good.
00:20:32 - Ishan Anand
Look, they've built a nice efficient model, and definitely I think it's something various audiences need to pay attention to. The AI community has some really exciting developments. Those who are concerned about policy need to reevaluate their priors. I don't want to undersell it.
I think Dario Amodei, if you've seen his post, did you see his post on DeepSeek? He basically said DeepSeek did build a model more efficiently, but not to the ratios that people think it was. Another key thing is people look at how much OpenAI or these other guys spend, and the number of people on this paper, the authors, is really, really large, like 100.
00:21:22 - Anthony Campolo
You're talking about this post, I'm assuming.
00:21:23 - Ishan Anand
Yes. Yeah. He makes a really good point, which is we're comparing this model to models that were trained or built 6 to 12 months ago. The cost of building a model has a Moore's law-like curve, and so it's not surprising where they landed.
He walks through the rough, high-level math to show it's not surprising where they landed. Yes, it is more efficient than we might have expected, but it's not wildly. It's not 5 million versus 5 billion. That doesn't count the people, it doesn't count the experiments. He even says here, excluding costs associated with prior research, ablation experiments on architectures, algorithms and data.
One thing that has to be remembered is that the whole field is very empirical. When you're adjusting the hyperparameters of a model, you don't know if this is really going to improve things or not. A good example is it took years for people to realize, oh, we're going to put the layer norm here instead of over here, and that improves training time. That was a space, and people have to do experiments like that. Oh, if we tried even multi-layered attention, is this really going to work?
What you do is you try it in a smaller model and you're like, oh, this seems to work. Try it in a slightly bigger model, and then you get a sense of scaling law for any particular modification you're making. Is this going to hold up, or is this tapping out and doesn't really improve at scale? There's a lot of work and research that goes into that. There's a bunch of safety and a lot of other things to put things out. This is literally just to get one run. It's the final run that you do to train the model.
[00:23:02] But it ignores everything else. It's like judging a house by just the timber cost. There's so many other things that go into it.
00:23:11 - Anthony Campolo
Yeah, I know we're fairly short on time, so unless you have any other big things you want to say on DeepSeek, we should transition into your JS thing.
00:23:18 - Ishan Anand
Yeah. So I'll just say one other thing. I do have a video on R1 Zero, which I think is the most fascinating to me intellectually about the model. That's where they built a reasoning model that you didn't have to do the common RLHF pipeline for. It kind of just used reinforcement learning with a really strong base model and suddenly learned to think on its own, which is both really interesting and exciting, but also potentially scary.
00:23:52 - Anthony Campolo
Where can I find this video?
00:23:53 - Ishan Anand
This video is on YouTube. It's on my channel.
00:23:58 - Anthony Campolo
What's your YouTube channel? I can't remember.
00:24:00 - Ishan Anand
It's called Spreadsheets Are All You Need, but I'll drop it in our chat. In that video, I walked through what they call a post-training pipeline. Basically, R1 Zero skipped a lot of those steps. To be clear, R1 did not. R1 still had a complex, semi-complex pipeline, but the implication here that's really important is that building reasoning models has gotten a heck of a lot easier for open source developers.
There's a bunch of experimentation I walk through. This gentleman, Will Brown, I think he's at Morgan Stanley, had a really simple example of building a reasoning model, a very simple model. You can just go and run that on RunPod and you can see the aha moment yourself, where the model gets smarter. So that's the last thing I'll leave, but check out that video and you can see more of that.
00:24:53 - Anthony Campolo
Cool. Yeah. I just subscribed to your channel.
00:24:56 - Ishan Anand
Okay. So then let's talk about the JavaScript implementation.
00:25:03 - Anthony Campolo
First off, is there a repo for this?
00:25:05 - Ishan Anand
There is no repo. You can just go to, not GitHub, you go to SpreadsheetsAreAllYouNeed.ai slash gpt2. There will be a repo. I just haven't finished getting around to it, but this is where you can see it.
Let me set the stage here. To understand where this is coming from, I built this Excel spreadsheet which you can download from GitHub. So if you go to GitHub, you can download it.
00:25:34 - Anthony Campolo
Last episode was about.
00:25:36 - Ishan Anand
Exactly.
00:25:36 - Anthony Campolo
That.
00:25:37 - Ishan Anand
So it's a spreadsheet you can download. If you're on an older version of Excel, get version 0.7. If you're on a newer version of Excel, get 0.6 because the formulas are a little cleaner.
Basically, you type in some prompt in here, and then you hit run and go to Calculation Options. I'm on manual. If you calculate the entire thing, it will eventually spit out the same next predicted token as GPT-2. On Hugging Face Transformers, run at temperature zero, they'll give you the identical outputs. This thing can only handle about ten tokens. It's an Excel spreadsheet, but it implements the entire thing inside pure spreadsheet functions.
One of the things I learned is that a lot of people don't have the latest version of Excel. This can be really slow, and I was very limited in the kinds of things that I could do and illustrate.
[00:26:32] So one example of that is I wanted to count the number of FLOPs that occur during a calculation, and you can actually derive it from the parameters, but I want to be able to calculate it empirically. So what I did is I built this weird kind of framework that's like a mix of a Python notebook and a spreadsheet that live in a browser. And there's really two sets of web components here.
00:26:53 - Anthony Campolo
First off, I just want to say this is really fascinating that you've created like a JavaScript, like Python notebook, essentially.
00:27:01 - Ishan Anand
Yeah, I wanted something that was going to run everywhere. What you have are two sets of components. Everything's a cell, and it's called SANE or Spreadsheets Are All You Need Cell. There are two types of cells. There are cells that are code, so that's these ones, and then there are cells that are sheets or tables.
What you see here is this. The first thing I need to do is download the model parameters.
00:27:28 - Anthony Campolo
Yeah, you gotta get this in a repo because I want to be able to just clone this down. This is a little ridiculous.
00:27:36 - Ishan Anand
Yeah, it is. Let me pull this up, though. First thing you have to do, there's a link if you're going to try this out to download all the model parameters. So this is GPT model parameters as a CSV collection of CSV files. There's 150 of them.
What you have to do the first time you come through is download it, unzip the file, and then you load it in here. It takes a little bit of time, maybe less than 30 seconds. It loads all 150 files, and then they're sitting in your IndexedDB database. So if you reload the page, you're good. At this point you've got the entire model. Every part of the model is here. No abstractions to get in the way. It's not turtles all the way down. The turtles stop at some point, and they stop at vanilla JavaScript.
So here's the matrix multiply.
[00:28:28] Right here.
00:28:29 - Anthony Campolo
Is matrix multiply?
00:28:29 - Ishan Anand
Multiply. Yeah. Let me blow up the font. Let's do that. There you go. That might be too much. Yeah.
So here's the code for matrix addition. Here's the code for matrix multiply. One of the things I'm working on right now in the video is you want to count the number of FLOPs. You can just do window.flopCount++ and do that. There's basically like ten places. Then you can hit run and you've redefined the entire function. So you can make modifications to this code, and that's the base code. Then everything builds on that.
So here it's separated into words. Let's see, here's a sheet. This runs one of your JavaScript functions that you've defined and then it will show the result in a spreadsheet-like table. This will run everything up into that sheet.
So let's see here it is. We have the prompt, Mike is quick. He moves. It gets separated into these words. If I keep going down, let's see, it's going to pull, we'll get our prompt to tokens, and then our final list of tokens is here. So let me run this all the way here. This will take a little while. There it is. Mike is quick. He moves. These are the token IDs, right? Yeah. Well, this is snappy. If you run the whole thing, it'll take about a minute or two minutes. So here we can just set it to go and you can see the whole thing running.
00:29:45 - Anthony Campolo
By the way, does it feel good coding in a programming language again instead of Excel?
00:29:49 - Ishan Anand
It does. I have to say it really does feel good to be in a real programming language. So what this allows you to do is basically, if you're a JavaScript or web developer, you can go and you can see how this works in as much detail as you want.
I like to think of it as two levels of detail for understanding the model. The first is you can look at these sheet cells and just see the function. You can say, oh, matrix multiply or layer norm. I don't need to know what that does, but I can see the steps that are happening. And actually, if I unclick this, and you look inside, the way I'm referencing things, for example, I've created this little weird pseudo-notation. So block step four is up here. Every table can reference a previous table by putting these brackets.
That fixes one of the problems I had inside this spreadsheet where I had weird formulas that needed to address these offsets. This is really a much simpler formula than you'll realize, but it's hard to see that when you've got this offset, which is how I define a matrix of a certain size and dimension. Then I have to go pull from a particular tab that has the weight matrices. It just got really hard to read, and it really made it harder to understand what the model is doing.
But here I can just say, okay, you do a matrix add, a matrix multiply of step three, and whatever steps come after that. So that was my other goal, to make this so you could eventually, at some point, write to the formulas and then you'll see what's happening.
Let's see. I think it's on. Let's see where it is. Let's see. Where are you? If I tab away, the browser optimization will turn off the JavaScript. I think it's in the iterate blocks. Yeah, that's where it is.
[00:31:33] So now it's coming through. It's basically going through the entire process. The key other thing that makes this work is you get a sense for how much this is a fully deterministic process. A regular program, like Microsoft Word, is a bunch of control-flow statements like, if the user does this, then do that. There are very few if-thens in here. I know exactly what the calculation is going to do. I know a fixed amount of compute that's going to take place in the entire process.
Then let's see, we're going to get our final token out here. There's our logits. There's our predicted token. There's "quickly" again, so run the whole thing. You can see everything that's happening inside the model.
My goal is to take my class. So I teach a class, I'll bring it up here on Maven, where I basically walk through this entire Excel sheet and I show you how every step of the model works. I give you all the background you need without all the formulas. You really just need to have some recollection of what calculus is. I don't bother computing the partial derivatives or the gradients. I do that for you in one example where I show how backpropagation works, but there's no questions or tests on, go ahead and calculate the derivatives. PyTorch and similar tools do that for you.
But I give you a very strong intuition for how the model works and what each component in the model is trying to do. I'm moving to making more of that class work with this tool specifically so web developers can take their JavaScript knowledge and understand how a model works.
I know if you go into any of the forums I hang out on, like the Reddit Learn Machine Learning, people are like, well, how do I get into machine learning? What do I have to do? And people are like, well, you got to take six months of this math class, six months of another math class, and all this other stuff. Then you take this basic machine learning class, which is going to do logistic regression and all these other things which are good, like k-nearest neighbors. These are really good.
00:33:34 - Anthony Campolo
Yeah.
00:33:34 - Ishan Anand
Yeah. They're really good techniques to know if you want to learn all of machine learning. But I suspect most people are getting into it because of the ChatGPT moment, and they want to know as efficiently as possible how the heck does this thing work. And that's my goal.
So basically, in five lessons, an hour and a half each one, we walk through every single part of the model. Even if you've got no machine learning background, I don't think this idea that LLMs are, "Oh, let's put that at the very end of the class," is necessary. That makes sense if you're trying to build up.
00:34:07 - Anthony Campolo
You're trying to build up somebody, the whole deal.
00:34:09 - Ishan Anand
Yeah.
00:34:11 - Ishan Anand
It makes sense if you're trying to be a machine learning engineer and do machine learning research, because there are a lot of problems where maybe you shouldn't use an LLM to solve it. But I think a lot of the interest right now is driven by that. If you just need to know because your boss or somebody else is saying we need an AI strategy, they're probably talking about a ChatGPT or large language model strategy.
If you want to understand how they work, like concretely, what are all those settings in the UI? What is top-p? What is top-k? Or you hear about some research like DeepSeek, what were the innovations there? What did we know and what did we not? This course is really designed to get you to the point where you can even read parts of a paper and get the gist of it, even if you don't have the mathematical background.
00:34:52 - Anthony Campolo
So that's what I find is still missing.
00:34:54 - Anthony Campolo
What you're working with is you already have the pre-trained parameters, right? Are you ever going to do something where you will actually train a model from scratch?
00:35:13 - Ishan Anand
It's funny you say that. I am working on that.
00:35:16 - Anthony Campolo
So sorry about that.
00:35:18 - Ishan Anand
I would need to build an autograd system in order to do that, but that is one of the reasons why I did this. So I have this. This is one of the spreadsheets I use. Let's see where that goes.
00:35:37 - Anthony Campolo
And also how many cohorts have you done of this class so far?
00:35:41 - Ishan Anand
I'm probably on my fourth or fifth cohort right now. The last cohort was in January, and what I've done is I've left the cohort open for anybody to join and watch the recordings.
So I had a bunch of people, almost every cohort I had a couple people who would just join asynchronously. They wouldn't show up for the lectures. And I had plenty of people like, I really want to take this, but the time does not work for me, and I don't think I can do 90 minutes during the day. So I said, well, look, you can always watch the recording and I'm happy to answer questions. So now what I've done is I've left it open. You get access to the recordings, you can connect with folks in the same Discord community, you get all the quizzes, and then you have the option to attend a future live cohort for free. So in the future, if you ever want that live experience.
[00:36:29] So that's the answer to that question. I am working on doing an automatic differentiation. I might use an existing framework. I found one in JavaScript that I'm testing right now, and that will give me a head start. It's called JS PyTorch, actually.
I started looking at it, and it looks like their transformer block is slightly different from the way GPT-2 does. But all the other components they have are linear and the other ones are probably close enough. I've gotten through, I think, maybe my first part of the A block or layer, and so far it seems to match up. Let me see. Yes, that is it. So I can use that, probably. That's something I'm actively looking at. I can show you this, though, which is why I need to go to that.
[00:37:25] So this is a very simple neural network trained inside a spreadsheet. This is literally training it to learn. So you've got this parabola and you're trying to get this network to learn it. Each one of these is an epoch. So here's one epoch, and then here's a second epoch. You can actually see, epoch by epoch, sorry, I said epoch twice, the overall loss is going down.
You can actually train this by just doing spreadsheet copy and paste. So if I go here, copy this, and then I paste.
00:38:07 - Anthony Campolo
This is the old school. This is backpropagation. This is like from the 80s.
00:38:13 - Ishan Anand
Well, yeah. This is the old school. So you can see this is where I calculate the partial derivatives out for the class. Then we use those to actually adjust the learning rate.
00:38:20 - Anthony Campolo
This would have been so useful for me when I was first trying to learn AI back in like 2016, and I just had no idea what the hell anything was because I was reading about things like backpropagation and gradient descent. I was reading these highly technical, complex descriptions of it. This would have been super useful.
00:38:37 - Ishan Anand
Oh, well, that's very validating. But the key point is, even for this simple two-node network, you can see how complex it gets in a spreadsheet to do it. So once I built the entire thing and I was like, I really want to train a model, it was another reason why I moved to JavaScript.
I can't train in a spreadsheet because I want to do things like, what's the difference between DPO and GRPO? GRPO is one of the other innovations that DeepSeek did. So it's hard to do that when you're in a spreadsheet. At that point, the spreadsheet is getting less useful, and it's time to go back to a programming language. So that's why.
00:39:15 - Anthony Campolo
Yeah.
00:39:16 - Ishan Anand
Yeah, exactly. Stay tuned for that, because I'm very excited for you to be able to fine-tune a model. I think it would take too long to train a GPT-2 class model on a single machine. What I can do in the demo I'm trying to do is start with the existing weights of GPT-2, then change what you want the outcome for a prompt to be, and you can go ahead and fine-tune.
JS PyTorch actually has a demo of training a model to identify the MNIST dataset. That's a very tangible, visual example. You can see it getting better and better. I would present it slightly differently, but he's already done some example work there and he's already got an optimizer in there. That's why I'm very excited to be going down that path. Once I can do that, I can show a lot more AI techniques.
[00:40:07] My whole goal is this radical idea that we can take simple tools to make the concepts underlying these amazing programs more accessible, because they're not as complex as you might be unintentionally misled to believe. It's very easy to look at the news headlines and be like, this is arcane magic, and I think that's misleading.
It's also dangerous because when it feels like magic, you can have irrational reactions. There are plenty of rational reasons to be concerned about AI, but I don't want to focus on the irrational ones.
00:40:42 - Anthony Campolo
Also, when it's magic, you have to defer to the magicians, which is disempowering.
00:40:47 - Ishan Anand
That's an interesting way to phrase it. I like that phrasing. I want people to feel like, oh, wow, this is interesting.
When R1 came out, there were plenty of people in the community who were like, oh, let me go try this. Like I mentioned Will Brown, it's like, wow, this works. This is really interesting. I want more people to feel that. Oh wow, I can try this.
That doesn't necessarily mean there's no reason to use a hosted model, to be honest, in terms of total cost of ownership and stuff like that. But at least you don't feel as disempowered when you understand what's happening under the hood.
00:41:22 - Anthony Campolo
Yeah, I was going to ask about the models. How far do you think you can go with just GPT-2? And do you feel like you're going to want to eventually get to the point where you're creating your own transformer, or that you're going to want to bring in a more modern model? Because it seems like everything you're doing is kind of built around GPT-2 and has been from the beginning.
00:41:41 - Ishan Anand
That was true in the past, but I don't want that to be true going forward.
This is the much more speculative area that I'm working on, which is, I think, this idea of a Python notebook-like environment that is web native and can go anywhere is actually really powerful as a prompt optimization and engineering tool. I think there is going to be a future application that is, call it a spreadsheet on steroids or call it a spreadsheet mixed with AI, mixed with a Python notebook, that's going to help or be the workbench for users to evaluate data coming from models, adjust the data coming back from models, or engineer prompts around models.
And so I very much want this to eventually be a tool where some of these functions you're running aren't calculating logits. They're going off and calling an API, but you're doing higher-order work. So maybe you're doing the same type of prompt optimization. You might be doing another workbench in code, but it's right here.
[00:42:49] And you get a workbench that makes it really easy to experiment and see the results. I've seen some tools, for example, that embed AI inside a spreadsheet. The problem you have when you do that is you first start out and you're like, I want to just perfect one row. Don't run it for all the pieces of data.
And then you're in a spreadsheet environment. You're like, I really need to just do this little thing. What you really want is something that lets you iterate till you get the prompt right, and then you're like, okay, now go run it on the thousand records I have. And that's kind of what I'm imagining that flow might look like for something like this as a prompt engineering tool.
So I definitely envision a version of this where you're using it to make calls to an API for a model rather than actually running it locally in your browser. Another flavor of this that I started prototyping is one that runs as an extension.
[00:43:43] So it can go and open tabs.
00:43:46 - Ishan Anand
Extension. Yeah, as a browser extension. So it would open tabs. It will navigate to a page. It will click on things for you. And then you can pull the result into a table. You can look at it and then have another step.
And because it's got human in the loop, there's a couple of advantages. One is that it's going to use your own credentials, so you can log in with your existing login. You can see what it's doing as it's doing it. You've got an accelerator and a brake here by hitting play on every step it's doing, so you can walk through the processes it's going through.
And it runs entirely locally, so you don't have to worry about whether you're sending data to some organization. If it runs entirely locally, you might be okay. And then you could just install it and be all set. So that's the other flavor of this that I'm working on.
I talked about this, actually. There's a video of me presenting at AI Tinkerers the second time where I briefly alluded to trying to put it inside a browser for agentic workflows.
[00:44:38] And the last thing is why I wanted to make it web native: I can put videos. I can put a chatbot assistant here that you can talk to, and you can be like, okay, I don't understand what the logits are doing here, and it can talk to you and tell you what's going on.
One interesting thing about making these, which are not actually sheets but really tables, is that spreadsheets and AI don't always work together. There's a paper called Spreadsheet LLM that talks through all the issues. And it's very easy, when you give an AI model the serialized version of a spreadsheet, for it to get confused as to what it's looking at. And it's a lot easier with tables to just say, okay, everything has this data type. Here's what you're looking at. So that's the other avenue that I'm looking at right now.
So I'll pause there since I've been talking for a while.
00:45:29 - Anthony Campolo
No, that's great. And we only got like five more minutes. So for people watching, all the links that we've been sharing and talking about are going to be in the video description, so you can check those out.
And yeah, I mean, you already kind of talked about some of the next things that you're going to be working on, but is there anything else exciting in the pipeline that you want to talk about before we close it out?
00:45:49 - Ishan Anand
I think we covered it. I'm really excited to do this thing that's used for prompt engineering, particularly in the browser. And then the other one is to have training where you can train a model inside using this as a workbench.
Those are, I think, the two most interesting things that I do that might come out. I might do additional videos. I have one that I'm working on right now on counting FLOPs and specifically like the DeepSeek cost to train. So maybe look at my YouTube channel for more of those.
00:46:19 - Anthony Campolo
Awesome. Yeah. No, the things you're working on are the things that I was curious about. So that's super cool.
And we'd love to have you back on in, like, a month or two, since, you know, get a better cadence going. Last time it was like six months ago we chatted. But you're working on very cool stuff, and I love talking with you about AI things because you're very plugged in.
You know, I wasn't even aware of, honestly, the R1 preview. I've kind of fallen out of listening to a lot of AI podcasts and stuff. I'm planning on creating an AutoShow-enabled AI summarization newsletter that just takes in all these podcasts and writes summaries of them, just for myself and also for other people. So hopefully I can keep me in the loop a bit more. But at this point, you're more plugged in than me, so good job on that.
00:47:08 - Ishan Anand
Okay. Thank you. I will take that.
It's hard to keep up, especially the last four weeks we've had. There's the operator release. There's deep research. I still haven't had a chance to really test drive that. So computer use, it just keeps coming. Definitely a tool like that would be helpful.
00:47:27 - Anthony Campolo
All right, good stuff, man. All right, we'll call it here. Thank you so much. And for anyone out there who wants to check you out, what's your socials?
00:47:36 - Ishan Anand
I am on Twitter. I-A-N-A-N-D is my username, @IAnand. I am also on Bluesky and LinkedIn.
The best way to find me, though, is to go to my websites, which are SpreadsheetsAreAllYouNeed.ai. And then my personal website is Ishan.com, and I have some writings up there. That's where I had my predictions for 2025 as well.
00:48:03 - Anthony Campolo
Very cool. We'll call it there. We will catch you guys next time.
00:48:08 - Ishan Anand
Thank you.