
Autogenerate Shownotes with Whisper-cpp and yt-dlp
Anthony Campolo discusses his open-source project AutoShow, which automates the creation of show notes and summaries for video and audio content using AI tools
Episode Description
Anthony Campolo demos AutoShow, an open-source tool that generates AI-powered show notes, summaries, and chapters from YouTube videos and podcasts.
Episode Summary
Anthony Campolo joins Nick Taylor to showcase AutoShow, an open-source project he built to solve a personal pain point: generating polished show notes, summaries, and timestamped chapters for audio and video content. The tool chains together several technologies — yt-dlp for downloading YouTube content, Whisper.cpp for local transcription, and LLM APIs like Claude and ChatGPT for generating structured summaries. Anthony walks through the evolution of the project, from manually copy-pasting transcripts into ChatGPT to a fully automated pipeline that takes a YouTube link and produces SEO-optimized metadata in under a minute. During the live coding session, they run the tool on one of Nick's videos using both local Whisper transcription and cloud services like Deepgram and AssemblyAI, comparing the outputs. The conversation branches into practical topics like cost tradeoffs between local and paid transcription, plans for productizing the tool as a pay-as-you-go web service, and potential workflow automations using GitHub CLI for content creators. They also touch on the broader theme of AI engineering accessibility, arguing that building useful AI-powered tools today is mostly about scripting and API integration rather than deep academic knowledge, making it an exciting time for developers to experiment.
Chapters
00:00:00 - Introduction and AutoShow Overview
Nick Taylor welcomes Anthony Campolo to the stream and introduces the topic for the session. Anthony explains the motivation behind AutoShow, a tool he built to address the tedious process of generating show notes, summaries, and chapter breakdowns for the large volume of content he produces across podcasts, streams, and videos.
Anthony describes how he realized that feeding a full timestamped transcript to an LLM with a carefully crafted prompt could automate the creation of episode descriptions, summaries, and discrete chapter segments. He outlines the core technology stack — Whisper.cpp for transcription, yt-dlp for YouTube integration, and LLM APIs for generating the final output — and traces how the project evolved from a series of manual steps into a single-command automated pipeline.
00:05:11 - Demo Prep, Cost Discussion, and Previous AI Work
Nick acknowledges the chat and they begin discussing practical concerns like the cost of running AutoShow with various API keys and transcription services. Anthony explains the spectrum of options, from completely free local processing with Whisper.cpp to paid cloud transcription and premium LLM models that might cost a few dollars per episode, with cheaper alternatives available at reduced quality.
Anthony also demonstrates a previous use case where he ran AutoShow on Nick's coworking streams and fed the summaries into a LlamaIndex chatbot. The chatbot was able to accurately summarize Nick's recent work and outstanding tasks, showcasing how the tool can serve not just content creation but also work logging and meeting summarization — a flexibility enabled by attaching an LLM to large chunks of transcribed text.
00:13:17 - Open Source Vision and Product Plans
Anthony shares his vision for keeping AutoShow open source while building a paid product on top of it. The open-source repository will remain the base logic layer, allowing anyone to run the tool locally for free, while a web-based frontend will let non-technical users input a YouTube link, select their preferred services, and receive generated show notes for a fee.
He discusses the challenges of pricing and monetization, noting he has never built a SaaS product before. The conversation covers potential approaches like pay-as-you-go versus subscription models, the complexities of calculating margins across different transcription and LLM service costs, and his wife's encouragement to move toward a subscription service. Nick suggests pay-as-you-go as simpler to implement and less risky.
00:18:25 - Live Coding: Setting Up Whisper.cpp and Running Locally
Nick and Anthony begin the hands-on portion of the stream, opening the AutoShow repository in VS Code and walking through the setup process. They install dependencies, clone Whisper.cpp inside the project, and build the base transcription model. Anthony explains the project's dependency structure, including SDKs for OpenAI, Anthropic Claude, Deepgram, and AssemblyAI, plus Commander.js for the CLI interface.
They run the tool on one of Nick's short YouTube videos, encountering an error when the default large model flag doesn't match the compiled base model. After fixing the flag, the tool successfully downloads the video, extracts audio, runs it through Whisper transcription, and generates a markdown file with the prompt and transcript. They then manually paste the output into ChatGPT to demonstrate the original workflow before automation.
00:26:16 - Reviewing Output and Automated Pipeline with Cloud Services
They examine the generated markdown output, which includes a one-sentence description, a paragraph summary, and timestamped chapters. Anthony explains how the prompt can be customized for different chapter lengths and additional outputs like key takeaways. They then move to the automated pipeline, running the tool with Deepgram and AssemblyAI transcription services feeding directly into the Claude API, eliminating the manual copy-paste step entirely.
Nick reviews the output and notes its accuracy despite minor spelling issues with proper nouns. They discuss how transcription services like Deepgram and AssemblyAI offer configurability for removing filler words, custom word banks, and punctuation handling. Anthony compares the two services and mentions he personally prefers Deepgram but acknowledges AssemblyAI's momentum and funding advantage in the market.
00:38:40 - Code Walkthrough and Architecture Discussion
Anthony guides Nick through the project's code structure, starting with the main CLI entry point built with Commander.js and moving into the core process-video logic. They examine how the tool uses yt-dlp to extract YouTube metadata, how Whisper.cpp is called locally for transcription, and how the cloud transcription services and LLM APIs are integrated as alternative pathways.
Chat participants suggest improvements like using Google's ZX or Execa instead of raw execSync calls. Anthony acknowledges these suggestions and discusses other planned improvements including interactive CLI prompts using Inquirer, better error handling, and RSS feed support via a fast XML parser. Nick mentions the Effect TypeScript library as another potential improvement for structured error handling.
00:53:51 - Content Creator Workflows and Automation Ideas
The conversation shifts to broader content creator workflows. Nick describes his existing automation setup where he uses the GitHub CLI to create pull requests from scheduled content syncs, auto-merging deploy previews for his blog. He suggests a similar workflow for AutoShow where generated show notes could be submitted as PRs for review before publishing.
They discuss the value of repurposing content, the time burden of editing for solo creators, and why live streaming is attractive compared to polished YouTube production. Anthony shares that he used to spend ten hours editing podcast episodes, reinforcing the need for automation tools. Nick mentions his experience with Descript for audio editing and how transcription services can handle filler word removal.
01:05:05 - Future Plans, AI Engineering, and Closing Thoughts
Nick and Anthony discuss upcoming improvements to the CLI, including interactive prompts and a potential TypeScript migration that Nick volunteers to lead. They plan a future stream to tackle the conversion incrementally. Anthony reflects on how AutoShow is the first project he has built entirely from scratch and open sourced, contrasting it with his previous pattern of contributing to other people's frameworks.
The conversation closes with a discussion about AI engineering accessibility. Anthony argues that building AI-powered tools today is primarily about Node scripting and API integration rather than deep academic knowledge, making it approachable for web developers. They agree that despite cynicism in the industry, it is an exciting and empowering time to build software, and encourage viewers to experiment with the tools available. Nick previews upcoming streams and they sign off.
Transcript
00:00:23 - Nick Taylor
Hey everybody. Welcome back to Nicky T Live. I'm your host Nick Taylor and today I'm hanging out with my man, Anthony Campolo. Anthony, how you doing?
00:00:32 - Anthony Campolo
Yo, yo, yo. Super stoked to be back and happy to chat about stuff. I'm working on some AI things and want to have some conversations about that.
00:00:43 - Nick Taylor
Cool. Awesome. I'm just going to drop some links for places where people can follow you if they want to. I'll bring us over to Pairing View right away and we can jump into what we're going to talk about today. We're going to do some live coding too. So you created this repository called AutoShow — why don't you break down what it's for and maybe some of the tech under there, like we've got some different models for the LLMs and stuff. I think it'd be good to talk through all that before we even jump into things.
00:01:21 - Anthony Campolo
Yeah. So this is something where I was solving an issue that I had myself. I create tons of content — written content, audio content, video content. I go on podcasts, I do streams, all this different stuff, both guest appearances and my own things. You're in a similar boat. You've done almost all the same content mediums that I've done. You can think about them in different ways. Some people, if they're just doing a stream — like when you're doing a co-working stream — you'll have a one-sentence description in your YouTube description, a generic title, and then you just go and film an hour and a half of content. If someone wants to watch it, they can watch it. That's just the whole thing. The problem I wanted to solve is being able to take a huge chunk of content and do a couple of things.
00:02:15 - Anthony Campolo
I wanted to create a good summary, a good meta description, and specifically chapters. A lot of legit podcasts break down the show into discrete five or ten minute sections that hone in on a specific topic. I realized you could use AI to do this — if you had a whole transcript with timestamps, once the context window got big enough for long enough conversations, you could feed it to ChatGPT or to Claude and basically say, hey, here's a transcript and here's what I want. I want this summary, I want these chapters, I want the summary to be this long, the chapter scripts to be this long. You can tweak all these things. You can even say, I want new title ideas, key takeaways, things like that. I started doing this and I first started using Whisper.cpp, which is a C++ version of Whisper, an open-source transcription model from OpenAI that ended up being the first base layer.
00:03:25 - Anthony Campolo
I built a whole bunch of scripts around it and also added in yt-dlp, which is a tool that lets you interface with YouTube. I was thinking, even if you have a podcast, usually your podcast will also be on YouTube. YouTube is like an uber source of content for so many people. So I built out this scripting workflow where you'd take a YouTube link, download the video, convert it to audio, run the audio through Whisper transcription, then take the transcription and stick a prompt on top that would say what you want the show notes to be. Then I'd feed that whole thing to an LLM — first ChatGPT, and now I use Claude — and copy-paste back the response on top of the prompt. So you'd have the show notes and the transcription altogether. I was doing each of these steps manually and then eventually built up workflows where you just give it a single command and it gets you all the way to having the prompt and the transcript.
00:04:33 - Anthony Campolo
Just yesterday, I got all the pieces together to where what you really want — being able to feed in a transcription service and an LLM API so there are no manual steps whatsoever — is actually fully automated. You can start from a YouTube link and get everything generated right on the spot. Within a minute you have this SEO-optimized thing for your audio or video content. So that was a very long description, but hopefully that all made sense.
00:05:11 - Nick Taylor
No, it's all good. Thanks, Nate Codes, for joining us today. Met Nate at Render ATL last year. He's on mobile at the moment, but he was curious about the project, so he's bookmarked.
00:05:25 - Anthony Campolo
Nate, happy to see you here.
00:05:26 - Nick Taylor
It's been a while. Yeah. So no, this sounds pretty cool. This is like, I think the way a lot of projects start — this thing's annoying me, I've done these things but it's become tedious. You kind of scratch your own itch and boom. You've open sourced it for now, and I'm definitely curious to do some live coding and see this in action. Let's say this stream is typically about an hour and a half when I have a guest — what would be the cost of that if I'm using my own API keys? There are a few services in here, different transcription models. We've got an OpenAI API key and stuff. Just to kind of gauge what's a potential cost of this?
00:06:24 - Anthony Campolo
That's an extremely hard question to answer. Let me explain how it's set up right now. I started it where you could do everything locally — originally there was no cost. Technically there's a cost depending on your setup because I have a paid subscription to Claude and ChatGPT, so I was using just my monthly subscription. But there are multiple trade-offs. There are trade-offs along the transcription route. You can do the transcription totally for free on your own machine if you're okay figuring out how to download Whisper.cpp, build a base model, and work with this C++ toolchain that's not very portable. If you want to use a transcription service, there are two different trade-offs: there are varying models within the transcription services themselves — more expensive ones that are better and cheaper ones that are crappier.
00:07:29 - Anthony Campolo
You can kind of try all the cheapest options. This is where I haven't really done this yet — I'm going to have to run like a 20-matrix benchmark using different transcription services, different models those services offer, and then different LLMs, because the LLMs also have the same trade-off: cheaper LLMs can take more text for less money and go faster, but they'll give you worse outputs than the expensive ones. I had been using the best transcription I could get locally, which is basically as good as many of the paid services at this point. The open-source Whisper model is really, really good. If you can just run that on your machine, that's honestly the best thing to do. And I always used the very best model I could get my hands on, which for a while was ChatGPT. There's a case to be made for Claude 3 Opus — I think it's probably the best one to use right now. But those, if you're using them through an API key, can get pretty expensive.
00:08:32 - Anthony Campolo
You may end up spending a couple dollars per episode — not that crazy. But if you go the cheap route, you could do the transcription and the LLM part for like 5 to 10 cents for an hour-long episode. It's just a question of how good the output needs to be. Are you publishing this output, or are you going to run it on 100 episodes and stick that all in a vector database so you can cross-query it? That's also an option. You can go with a slightly degraded version if you just want the raw summaries in there.
00:09:07 - Nick Taylor
Okay. Yeah. And putting it in the vector database would be super helpful if you wanted to develop some kind of search over time — not necessarily embedding a chat experience in your website, but you could have something like that.
00:09:26 - Anthony Campolo
That's what I showed you last time. Our last episode, when we did the AI front end — I kind of did this in reverse, showing you that I had already used the tool to generate some summaries for your episodes. Oh, and actually I wanted to share this with you.
00:09:41 - Nick Taylor
I got it here, I think, beforehand.
00:09:44 - Anthony Campolo
I had run it on your episodes — some of your guests — but you said, "Oh, we should do this on your co-working ones." So I just sent you two screenshots in the Discord. I ran AutoShow on your last ten co-working streams over the course of May, and then gave those summaries to the LlamaIndex chatbot we created. I asked it two things: what recent work has he been doing this month, and what still needs to be done. You can read this and let me know if it makes sense for what you've been working on.
00:10:30 - Nick Taylor
Yeah, so I shared one on the screen there so people watching can catch it. "Star Search feature enhancement" — that's a new feature we built out at Open Sauced using large language models and GitHub data. And yeah, I was debugging a carousel component. That's right. Yeah, I remember this episode. And then there's the issues table. That definitely checks out. Let me copy the other one so we can take a peek. "Finalize the implementation" — I ended up getting busy with other stuff, so my coworker Ayu ended up doing it, but.
00:11:11 - Anthony Campolo
But it wasn't any tasks. That's correct.
00:11:13 - Nick Taylor
Yeah. Cool. Did some screen testing, sizes, and code cleanup. This is, I would say, pretty accurate, so that's pretty cool.
00:11:27 - Anthony Campolo
This is a whole different use case from what I was doing. I've been wanting to create summaries of content. But for you, this is like a log of the work you've done and the work that still needs to be done — like having a meeting summarized for you, which is an entirely separate use case it can just do because of the flexibility of having an LLM attached to a huge chunk of text.
00:11:50 - Nick Taylor
Yeah, no, totally. I'm just dropping our previous stream on YouTube and in chat here as well.
00:12:02 - Anthony Campolo
Cool.
00:12:03 - Nick Taylor
If folks check that last one out too, it ties into this, like Anthony was saying. All right, sorry — just checking out chat and stuff. I'm fine multitasking with the chat. The thing is, I'm using Restream, but even with StreamYard you can't post messages to Twitter or X during a live stream. You have to go over there.
00:12:32 - Anthony Campolo
Yeah. But what's funny is their messages now come in on StreamYard. I just realized this because I was doing a stream yesterday. Someone commented on Twitter and it came in through StreamYard, so I went into Twitter and responded in the chat through my own account.
00:12:50 - Nick Taylor
Yeah, same for me. I'm using Restream but I'll see LinkedIn messages or Twitter or X. You just can't respond, and I'm guessing there's no API for that yet.
00:13:06 - Anthony Campolo
I don't think StreamYard pipes in LinkedIn messages. I'm not sure. No one's watching — that's not true. My sister watched me on LinkedIn once.
00:13:17 - Nick Taylor
Cool, cool.
00:13:17 - Anthony Campolo
Also, I want to talk about the open source.
00:13:22 - Nick Taylor
Oh yeah?
00:13:24 - Anthony Campolo
My vision of where it could go — it was really important to me to build this tool in an open-source way. But I ended up having multiple people in my life at various points, as I was explaining it and showing it to them, who said, "Why aren't you charging for this? Why aren't you making this a product?" I found a lot of people who found a lot of use for it in weird, different, unique ways. So what I'm thinking right now is there will be this open-source repo that always stays open source — basically the base logic. If anyone wants to generate this stuff totally for free, even with open-source models — the next step I need to do is integrate llama.cpp so you can do the LLM step locally as well, which is the one piece that's missing. But that'll all be there. Then I'm going to build a front end that will allow non-technical people who don't know how to clone a repo and run a CLI.
00:14:23 - Anthony Campolo
They'll be able to just input a YouTube link on a form, click a button, pay however much, and get it back right there in a UI. That's where this is eventually going. I think I can still keep it as an open-source thing I work on in public, but have a part of it that can be monetized. I've never built a product before. I've also never really built a legit open-source project — I've done a lot of open-source work, contributing to frameworks and things, and that's something I've done for a long time. But that was always me finding a cool project and glomming onto it and finding interesting people doing cool work. This is the first thing I've built totally myself from the ground up. I've open-sourced it and it's got eight stars right now, which is eight more than any of my other repos.
00:15:17 - Anthony Campolo
That's pretty cool. And it was in the Node Weekly newsletter. Peter Cooper puts out his whole slate of Cooper Press newsletters, and he posted my blog post. I wrote a blog post two months ago covering the very first implementation. You should pull it up — go to ajcwebdev.com. Okay, there we go.
00:15:40 - Nick Taylor
It's already in my history.
00:15:41 - Anthony Campolo
Just go to Blog, and then the second most recent one. It's got a similar title to this stream. It shows you everything up to the point of Whisper.cpp and then using your own model, if you just have a subscription to ChatGPT or Claude. This does not include any of the transcription service APIs or LLM APIs we're going to go through today. That's going to be a whole separate blog post. This is how you do it entirely locally, and what we're going to do today is how you do it with services you're paying for.
00:16:19 - Nick Taylor
Okay, yeah, I was curious — you want this to be a paid product, obviously, but still keeping it open source. Are you thinking it's going to be a website, or are you thinking of making a small app? A desktop app wrapped in Tauri or Electron?
00:16:42 - Anthony Campolo
The first thing would be just a website, because I've never even built a desktop app or a mobile app. I'd want to start with what I already know how to do — a static website with some Jamstack-y stuff that will hit a Stripe API, and that's going to be the whole deal. Maybe integrate a database so people can save their summaries. But I'd go real simple: website dashboard, almost single page app, with very basic login and payment mechanisms. That's what I'm currently thinking. I haven't built any of this stuff yet. Right now I'm integrating the APIs and the paid services and figuring out how much I even need to charge for this. If I'm exposing these different services to people, I need to calculate based on how much content they give me what it's going to cost, so I can cover margins and still make a profit — because I'll be paying the API costs.
00:17:40 - Anthony Campolo
There's a lot to figure out still. It already has a lot of functionality open source. I'm thinking about building it into a product — my wife's been talking to me about this. She was like, how long would it take you to make it a subscription service? I'm just like, three months. It's going to take a while.
00:18:01 - Nick Taylor
Yeah, I feel you. That's super cool. I think it's going to be useful. You've already shown me some of this before. I haven't actually dug into the code yet because I only cloned the repo today, but I could definitely see this being super useful as a content creator.
00:18:23 - Anthony Campolo
So.
00:18:25 - Nick Taylor
Yeah. Cool. So what do you want to do now? I've cloned the repo, I've set up the environment variables for the API keys.
00:18:34 - Anthony Campolo
Yeah, let's open it up in VS Code. What I'm going to have you do first — this shouldn't take too long — is clone down Whisper.cpp and build the base model, which is kind of crappy, but should only take about a minute to build. All these instructions are in the README.
00:18:52 - Nick Taylor
Okay. Let me open the README then. Good old preview. Cool. Yeah.
00:19:02 - Anthony Campolo
So let's scroll down. Do you already have those two installed on Brew? I'm sure you've got FFmpeg, but do you have yt-dlp?
00:19:11 - Nick Taylor
I'm pretty sure I do, but let me just run it just in case. And while that's going on, we can chat a bit.
00:19:21 - Anthony Campolo
npm install — that's installing. Actually, go to the package.json so people can see some of the dependencies. It includes SDKs for two LLMs — OpenAI and Anthropic's Claude — and then two transcription services, Deepgram and AssemblyAI. Then it's got node-llama-cpp in there that doesn't actually do anything yet. I haven't written that code, but eventually it's going to be able to reach out to a local LLM. Then Commander.js — you know Commander.js.
00:19:59 - Nick Taylor
Yeah. That's for building CLIs, right?
00:20:05 - Anthony Campolo
Yeah. So you technically ran two commands at once. You ran the Brew command and then you had npm install after it.
00:20:11 - Nick Taylor
Yeah. I'm just looking at the Brew error. No such [unclear — formula or folder].
00:20:20 - Anthony Campolo
Oh wait, you're not in the right place.
00:20:22 - Nick Taylor
Oh, son of a — yeah, sorry.
00:20:24 - Anthony Campolo
Yeah.
00:20:27 - Nick Taylor
Let me go up one — AutoShow. Yeah, that would make sense. Cool.
00:20:36 - Anthony Campolo
Yeah, that'll work. And then go back to the package.json — there's one other dependency I want to explain. The fast-xml-parser. That's something I just added recently. Now you can feed it a podcast RSS feed, because previously it had only been working with YouTube links. If you have a podcast RSS feed at all with audio, it will now run this whole process on that. I first created this for FSJAM, actually, and then I went through all these different steps to build something with YouTube. But now I could just run it and I'm going to be able to run this on all 95 previous FSJAM episodes.
00:21:24 - Nick Taylor
All right, let me clone Whisper here. I'm just gonna —
00:21:26 - Anthony Campolo
You want to clone it inside of AutoShow?
00:21:30 - Nick Taylor
Oh, okay. What's the reason for that? Just out of curiosity.
00:21:34 - Anthony Campolo
It's because it's a Node script calling out to multiple things on your machine — one of which is yt-dlp and one is Whisper. This just allows the path and the main command to all be in the right place. This is also why working with Whisper.cpp is the most complicated [unclear] way of doing this. Most people are not going to do this — they're going to use the services. You want to stay in the base directory the entire time you're running these commands.
00:22:10 - Nick Taylor
I'm just going to run them one at a time so we can see things happen.
00:22:15 - Anthony Campolo
You're already not in the right place for Whisper.cpp. Stay in AutoShow.
00:22:23 - Nick Taylor
You literally told me what to do and I'm like, yeah, well, I saw —
00:22:26 - Anthony Campolo
You'd already done that after I told you — that's one of the reasons why I brought it up.
00:22:32 - Nick Taylor
Cool. All right, so let's go ahead with that.
00:22:35 - Anthony Campolo
This built the simplest, smallest model — the base model. If you really want a good transcript, you want the large model, but the large model takes seven minutes to download. When you run it on an episode, it'll take five to ten minutes for an hour-long episode. This is going to let us run on something real. I hadn't picked a video. Let me go on your YouTube and find a video that's about ten minutes or so. There's the "npm install --save-exact explainer" — that's what I want.
00:23:07 - Nick Taylor
Okay, cool.
00:23:09 - Anthony Campolo
So I'm going to give you this link.
00:23:13 - Nick Taylor
I'll just grab that out.
00:23:14 - Anthony Campolo
This command is done. You're going to run it on the very first command in the section where it gives you the node command. There's a --video flag where you feed a YouTube video, a --playlist flag if you want a YouTube playlist, a --urls flag where you give it a general list of URLs in a file, and then an --rss flag if you want to run on an RSS feed.
00:23:50 - Nick Taylor
I've got the link for the video up top there with Attila. You were saying you can run multiple. Okay, let me stretch this up so we can get some more real estate here.
00:24:10 - Anthony Campolo
It's going to go through a couple of steps and log each thing as it goes. The first thing it's going to do is download a WAV file — take this YouTube video and extract the audio. Actually, sorry — first it builds a markdown file with the metadata from the YouTube video. So it takes the —
00:24:35 - Nick Taylor
Let's see what happens. Also, just want to say hey to B1 mind in the chat there. Thanks for joining us.
00:24:43 - Anthony Campolo
I know what happened. I gave it to you with the default, which is to use the large model. Bump up the command again and give it a flag, -m, and then base. That flag lets you configure the size of the Whisper model you're using.
00:25:05 - Nick Taylor
Okay, so let's run this again.
00:25:07 - Anthony Campolo
This should work this time. To be clear, base, medium, and large are the three things you can pass to the -m flag, and the command you first ran to build the model — you need to make sure you built the right one. So now we got the WAV file this time.
00:25:26 - Nick Taylor
Okay, I see what you mean. We didn't compile the large model, so it's not going to work. Okay, gotcha.
00:25:34 - Anthony Campolo
It didn't know what to do because it looked for the large model, which is 3 gigabytes — that's why it takes a while to download. The base model is about 100 megabytes. This is why Whisper.cpp needs to be inside this repo: it's calling out to a model within the Whisper.cpp repo. That looks like everything worked. It says "process completed successfully for URL, prompt concatenated, transform successfully," with a filename like content/2024-05-05 followed by the video ID. Every YouTube video has a unique video ID. You should be able to go to the content directory and find this now.
00:26:16 - Nick Taylor
Okay, so let's open this up. So this generated the markdown. We've got some info about it, and then we've got the transcript with timestamps.
00:26:33 - Anthony Campolo
This is the whole prompt. You should read out the prompt and what it's actually doing. It's creating a one-sentence summary, a one-paragraph summary, and then the chapters. This doesn't give you suggested titles or key takeaways — those are other options. I'm eventually going to have a prompt flag that lets you decide what you want included. Right now it just gives you this, and if you want to tweak it, you can go in there and change stuff. I say a chapter shouldn't be shorter than one or two minutes or longer than five or six minutes. It doesn't always follow that exactly, but if you want chapters of 15 to 20 minutes, you could do that.
00:27:14 - Nick Taylor
Okay. Yeah.
00:27:15 - Anthony Campolo
And then you give it the actual output, and I have it create a markdown file with headers.
00:27:20 - Nick Taylor
Okay. Yeah, that's cool. I was reading about this the other day — I think it's learnprompting.org — and they were talking about this. It's like a one-shot prompt: you're putting an example in to make it very clear how you —
00:27:36 - Anthony Campolo
— want the output. "In-context learning" is the fancy term for it. Let's do this. I'm sure you have a ChatGPT subscription.
00:27:45 - Nick Taylor
Yeah.
00:27:46 - Anthony Campolo
So this is how I used to do things. I'm going to show you what I've been doing for months and months, and then we'll show how we can automate this. Just copy-paste that entire file, dump it into ChatGPT, and hit enter. Don't modify at all — copy-paste the whole thing and give it to ChatGPT 4o.
00:28:03 - Nick Taylor
Okay, let's do this. Let me load it up. Literally copy this whole file.
00:28:11 - Anthony Campolo
The entire file. Every single word. Yep.
00:28:14 - Nick Taylor
All right, let's bump this up a bit and paste it in. Boom.
00:28:21 - Anthony Campolo
What's cool is this will work up to about a 2-hour-long episode. ChatGPT used to crap out after a very small amount of text. Yeah, copy that code, go back to the markdown file, and paste it over the prompt. Leave the transcript but copy it over the prompt, and leave the front matter as well.
00:28:54 - Nick Taylor
Okay, so the prompt — which part? This whole thing here, right?
00:29:00 - Anthony Campolo
Even though it says "this transcript." The entire thing — every part of the prompt.
00:29:04 - Nick Taylor
Okay, including the example too, right?
00:29:07 - Anthony Campolo
Yep. Including "transcript attached." All of it.
00:29:10 - Nick Taylor
All right.
00:29:11 - Anthony Campolo
Yep. And "transcript attached" also.
00:29:15 - Nick Taylor
Oh yeah, there's that.
00:29:17 - Anthony Campolo
So you see how the output fits on top. It's like this itself could be a web page. Look at this in your preview mode so we can see it with the markdown.
00:29:31 - Nick Taylor
Okay. So we got summary, chapters, the episode. No, that's pretty cool, man. And obviously this is formatted, it's just a preview, so obviously you can —
00:29:46 - Anthony Campolo
Fix that if you add two spaces at the end of each line. That always bugs me — somebody needs to fix that in the scripting workflow.
00:29:56 - Nick Taylor
Yeah, you can surface this whether you pop in an Astro site or use remark or whatever you want on the front end. This is super cool.
00:30:09 - Anthony Campolo
I actually did this with Ben Holmes when I showed him this tool. I built out an Astro website with a content collection that matches the front matter, so you can just dump it directly. That repo is also public — it's called Astro Autogen, I think.
00:30:27 - Nick Taylor
Okay.
00:30:28 - Anthony Campolo
This is before I started calling it AutoShow.
00:30:32 - Nick Taylor
Okay. I think this is pretty cool and it's pretty accurate. There are some spelling mistakes — like it's "Crab Nebula" but this is like a company name.
00:30:44 - Anthony Campolo
It struggles with "nickytonline." Sometimes it'll spell it N-I-K-K-I-E or N-I-K-K-I. There are ways to mitigate that. Remember, we used the base model — so this is with the worst transcription model we could be using right now.
00:31:07 - Nick Taylor
Yeah.
00:31:07 - Anthony Campolo
The fact that it has anything readable at all is actually incredible. If I ran the large model, this would have taken five times as long — and it uses "delve."
00:31:18 - Nick Taylor
Yeah, sorry, I had to call that out.
00:31:22 - Anthony Campolo
You can put in the prompt, "Don't use the word delve," if you want.
00:31:28 - Nick Taylor
Yeah, I'm just thinking about this. Obviously this is useful for a content creator, for sure, even once you productize it. But I'm already thinking of use cases. Imagine I have a website — which most people doing content creation probably will — I could picture a workflow where, using the GitHub CLI, I create a pull request from the generated episode. You get a deploy preview with a PR, you can check it out, do some cleanup if you need to. Maybe you don't like some of the formatting, or you get the spelling mistakes like "Crab Nabila" here. But I could totally see that as a workflow. Right now I have a workflow — not on Nicky T Live, but that's just pulling in YouTube content and I have my schedule from Airtable, that's how that works. But my blog — I use dev.to as like a headless CMS and whenever I make a change, there are no webhooks on dev.to.
00:32:40 - Nick Taylor
They removed them years ago. Hard to maintain, I guess. But basically I pull once a night — they have an API so I just grab all my blog posts, and anything that changes basically updates the repo. My PR just shows the differences, and as long as the deploy preview and all the checks pass, I auto-merge it. I could see maybe not necessarily auto-merging because you might want to review this. But I could totally see that as a workflow — maybe out of the scope of your future plans?
00:33:20 - Anthony Campolo
That's completely in scope and something I would want. The whole point of this is automation — automate as much as possible. When you're a solo content creator and especially if you're not making any money on it — like I did FSJAM as a labor of love, to make connections in the industry, keep myself sharp, learn — I had all these reasons for doing it, and we never made a single dollar. You've really got to do everything you can to save time. Once you can start to leverage these higher-level AI tools, the possibilities completely open up. I love what you're suggesting right now — this is right in line with the whole mission of the project.
00:34:06 - Nick Taylor
I have the code to do this already, so feel free to poach it. Basically the only thing —
00:34:12 - Anthony Campolo
— send it to our Discord chat so I don't lose it. Yeah.
00:34:15 - Nick Taylor
Yeah. So basically there are two parts here. It's not your project we're talking about now, but I can just show you kind of what I do. So I generate my dev.to posts. I probably don't need a .env file anymore with Node 20 — or is it 22? Essentially I'm hitting the dev.to API and any changes update at the end. This is just the Node script that runs. This isn't really relevant to you because your thing would just be hitting your own API. But there are patterns here, because I've done this over and over — this is something I had to do at Netlify. I had to sync Sanity with a repo, a JSON file. Netlify has all these partners with integrations — Sanity, Cloudinary, and stuff. Sanity is supposed to be the source of truth. If Cloudinary updates their SDK, they're going to update it
00:35:26 - Nick Taylor
in Sanity. But that needs to be propagated to the repo because we use that repo for building things at Netlify — when I worked there. So I did this whole flow of how can I auto-merge things. You have to change some of the policies on the project to allow for auto-merging. Basically I take a timestamp to generate all the PR information — I create a title and the branch name. It's going to be the same branch all the time, so I put a date stamp in it to make it unique. I switch to that branch and then run a git add. This is after my script runs because there's a GitHub action that does these things. The GitHub action runs, generates my dev.to posts in the current branch, and does a git add. If there's nothing to change, there's a check here —
00:36:28 - Nick Taylor
basically if there is a change, we commit it. Otherwise I just say there was nothing to update. If there are literally no changes, when you do git add there's going to be nothing staged. I'm just leveraging git and the GitHub CLI here. I learned about this about a year and a half ago — you can create PRs with the GitHub CLI. I'm passing the title and a body saying this is an automated PR. After that you can call the GitHub CLI to merge it — auto-merge, delete the branch automatically once it's done, and squash it. It's a workflow I use all the time now. In the event dev.to ever disappears I'll have to change things up, but right now I just write on dev.to and once a night my site runs that GitHub action.
00:37:31 - Nick Taylor
So I never have to do anything for my blog unless I'm updating other parts of it that aren't the actual content.
00:37:38 - Anthony Campolo
Anyways, I lived the dev.to life for like two years, all into dev.to. I did all my blogging through dev.to and then I took a brief detour into Hashnode. There were things dev.to had that I didn't like, and a couple things Hashnode had that dev.to didn't. I was ruined and had to build my own blog — there was no way to get the features I wanted from both any other way.
00:38:04 - Nick Taylor
Yeah, I hear you. I'm biased because I used to work at dev.to, but there are definitely compelling features in Hashnode. They've got an AI component now. I like that they generate a table of contents. I don't know why dev.to doesn't do —
00:38:19 - Anthony Campolo
That still. Yeah, that was the big one. And then the styling just looked nicer, more modern. But the blog I have now actually looks more like a dev.to post — I kind of went back to that really old-school markdown look. But we're way off track. We should actually look at some of the code.
00:38:40 - Nick Taylor
Yeah, yeah.
00:38:41 - Anthony Campolo
Okay.
00:38:43 - Nick Taylor
To be clear, it's not just magic. Anthony didn't just wave his hands and poof, we got things working. Let me close the content here and yeah, we'll look at the code.
00:38:57 - Anthony Campolo
Let's go. autogen.sh, which needs to be renamed to AutoShow because I used to call this project Auto Gen. Sorry, not that one. autogen.js — this started as a Bash script that turned into a Node script. The Bash script is going to be phased out eventually. Let's not even look at that; let's just look at the Node stuff.
00:39:16 - Nick Taylor
Okay. You could probably use — sorry, Bun.
00:39:20 - Anthony Campolo
Shell Bash one right now, so —
00:39:21 - Nick Taylor
Oh, yeah.
00:39:22 - Anthony Campolo
Close this file and go to autogen.js.
00:39:27 - Nick Taylor
All right, here we go.
00:39:30 - Anthony Campolo
This is Commander. Right here, this is kind of the closest thing to docs. If you want to see everything the CLI does — it's very readable. We've got a --video flag to process a single YouTube video, a --playlist flag if you have a playlist of YouTube videos, --urls if you want to pick a bunch of YouTube videos and put them in a file and run it on that, and then --rss which will feed an RSS feed. Then the model flag lets you select different size models. At the end there it says "large" — so that's the default. That's why when we first ran the command, it broke: you didn't build the large model and it tried to run that because we didn't give it a flag. Then we have two flags for LLMs, one for ChatGPT and one for Claude, and then two flags for the transcription services. We haven't done that step yet, but that was the next thing we're going to show once we explain some of this code.
00:40:28 - Anthony Campolo
Yeah.
00:40:29 - Nick Taylor
Oh, hey, Fuzzy Bear is in the chat. How you doing? Fuzzy?
00:40:33 - Anthony Campolo
Fuzz Bear has heard me talk about this a bunch. He's been watching my weekly streams with Monarch as I've been building this out. So he knows all about this project.
00:40:42 - Nick Taylor
Cool. I was going to say — when we ran into that error, I know you knew what the error was right away, but I wonder if you could improve it so that before you run, it says, "Oh, you haven't compiled." Well, I guess maybe because you're going to productize this, it might not matter as much.
00:41:01 - Anthony Campolo
Yeah, there's a lot of error handling that could be done. There are a lot of ways to make this nicer, and this is why I'm sharing this with people and trying to get other eyes to help QA it. I've been building out this whole thing just with ChatGPT actually, because I'd never built a big Node scripting project. I've never even used Commander before. I'm learning a lot as I'm going, and error handling is something I'm always coming back to and trying to improve.
00:41:32 - Nick Taylor
Yeah.
00:41:33 - Anthony Campolo
So then —
00:41:35 - Nick Taylor
Sorry, I was going to say Fuzzy Bear is making me misty. He's like, "Great to see you guys." I was like, I'm not crying, you're crying. Anyways, yeah, sorry.
00:41:46 - Anthony Campolo
Go on.
00:41:47 - Nick Taylor
So you got all the options here.
00:41:49 - Anthony Campolo
Yeah. So that's just checking — that's just kind of logic and there's going to be a way to make this cleaner, I'm sure. Right now this is the base. Now go to the commands folder and the processVideo.js file.
00:42:08 - Nick Taylor
Cool.
00:42:08 - Anthony Campolo
So this is doing the heavy lifting — this is what's processing the video. For things like the playlist, there's also going to be a file for processPlaylist. It basically just runs this a whole bunch of times on a bunch of videos in a playlist. This is really the core logic right here. Zoom out just a bit so we can see a little more.
00:42:36 - Nick Taylor
I'm zooming something else. There we go.
00:42:39 - Anthony Campolo
That's the perfect size for me. So you've got the process video function, all the different things you can pass in — the URL, the model, whether you want to use an LLM or not. The mdContent is where it's creating the markdown. That's using yt-dlp. If I eventually want to publish this as an npm package, I'm not sure how well yt-dlp is going to play with that because it's technically a Python tool — that's why you have to install it. But anyway, that gets you the link for the episode, the name of the channel, the URL of the channel, the title of the video, the day the video was published, and then the thumbnail.
00:43:24 - Nick Taylor
Okay. Yeah, this is cool. Speaking of error handling, I hung out with Mike Arnaldi on Monday — he's the creator of the Effect TypeScript library, aka the missing standard library for TypeScript JavaScript as they call it.
00:43:45 - Anthony Campolo
Talking about Effect, man. Yeah, I was thinking about asking Dev to come onto my stream and explain it to me. I'm not a TypeScript person, so I have no clue.
00:43:56 - Nick Taylor
Yeah, I'm still very brand new to Effect, but it was pretty interesting. Basically there are patterns that are more like Rust — you can have errors happen, but there are exceptions you're expecting. Say you hit the API and there's a network error, so you can add retries. If you retry like five times and it's just not happening, you can call Effect.die, which basically means we can't do anything here. I think it could be an interesting project to try using it with. I might look into it — I've cloned this.
00:45:01 - Nick Taylor
I might throw up an exploratory PR just to see. Because this is definitely interesting to me, and we were talking before the stream — our team is slowly becoming all AI engineers and this is definitely relevant. Anyway, so we process the video and then just write the file, the markdown, and —
00:45:29 - Anthony Campolo
Cool.
00:45:30 - Nick Taylor
Yeah.
00:45:32 - Anthony Campolo
Okay.
00:45:33 - Nick Taylor
And then Deepgram here. Talk about the transcription.
00:45:36 - Anthony Campolo
Look where it says whisper.cpp/main. That's the part that needs to reach out to your local Whisper. If you didn't clone down Whisper and didn't build a model, you could use the Deepgram or AssemblyAI flags and run the transcript through them. That's what we're going to do next — show how to do that, and also feed the transcription to an LLM directly so we don't have to do the copy-paste step. I gave you two commands in the Discord. Both are going to use Claude, but one uses Deepgram and one uses AssemblyAI. I'm curious to see how they compare.
00:46:24 - Nick Taylor
Do you want me to run them in parallel or —
00:46:26 - Anthony Campolo
Just run the first one, and then we're going to have to rename a file after we do it or we're going to get a name clash — it's not really meant to do both at once. Just pick one and get your output. So we're passing it the .env file. We were talking about how you don't need a .env file anymore — that's because you can do --env-file=.env. Super obnoxious syntax, I always forget it. But you're passing it the Deepgram flag and the Claude flag, and you already have your API keys for Deepgram and Claude. For people following along at home, you have to get API keys. You may have to pay a couple bucks for credits. Deepgram gives you $200 of credit right off the bat, but I think it expires at some point, which is why you probably didn't have any anymore.
00:47:14 - Nick Taylor
Yeah, because when my coworker Becca was —
00:47:18 - Anthony Campolo
Unless you ran $200 of transcription already.
00:47:21 - Nick Taylor
Yeah, I don't think I did. I think I did a stream with her when she was working there and that was more than a year ago, so it definitely expired. Okay, so it looks like it ran successfully.
00:47:34 - Anthony Campolo
Yeah, it shows it in the output there. Let's go back to the content directory.
00:47:43 - Nick Taylor
Oh yeah, it's not going to be in the project. Hold on. Content. Okay, so we got the Claude one here.
00:47:52 - Anthony Campolo
Okay. So now we see how this did everything all at once. It ran the transcript, and the transcript is the other file that was created — hopefully it overwrote the one you had. Actually, look at your other file real quick so I can see what happened here.
00:48:19 - Nick Taylor
So this didn't add the meta.
00:48:22 - Anthony Campolo
So this overwrote the one we had previously.
00:48:25 - Nick Taylor
Okay, gotcha.
00:48:26 - Anthony Campolo
Yeah. That's why I was saying we should — scroll down a little bit. I want to see the transcript on this one. Scroll down in the file.
00:48:37 - Nick Taylor
Okay.
00:48:38 - Anthony Campolo
So this is the transcript created with Deepgram, not Whisper.cpp — I'm pretty sure.
00:48:47 - Nick Taylor
Okay.
00:48:47 - Anthony Campolo
And then it took the whole thing and fed it to Claude, and that's where the show notes in the other file came from.
00:48:54 - Nick Taylor
Okay. Fuzzy in the chat is saying, can I see the exec sync method — he's suggesting using Google's zx. Here it is, Fuzzy.
00:49:05 - Anthony Campolo
I have heard of zx. I've also heard of Execa. Do you know about Execa?
00:49:10 - Nick Taylor
Yeah, it's from Sindre. Another package from Sindre.
00:49:16 - Anthony Campolo
Yeah, my problem has been whether to use Execa or zx. I need to make a decision.
00:49:24 - Nick Taylor
Yeah, cool.
00:49:26 - Anthony Campolo
I appreciate the input, Fuzzy. I've been told something similar by the internet's collective unconsciousness as I've been building out this tool. If you know anything about Execa, I'm curious. Otherwise I might just go with zx. Cool.
00:49:46 - Nick Taylor
All right, so let's try the AssemblyAI one now.
00:49:50 - Anthony Campolo
Real quick, save those two files — drag them to the root of your project. That's one kind of dirty way to get them out of the blast radius. This is more error handling I need to do — so it doesn't overwrite files you already have. Yeah, still kind of quick and dirty right now. All right.
00:50:11 - Nick Taylor
All right, so we'll run the — okay, this has got AssemblyAI now.
00:50:16 - Anthony Campolo
Yep. That's the only thing that's different — it's still going to feed it to the Claude LLM and it's using AssemblyAI. Why don't we pull up Deepgram and AssemblyAI's homepages so people get some context.
00:50:29 - Nick Taylor
Okay, I'll let that run in the background.
00:50:35 - Anthony Campolo
So you already had a Deepgram account — have you tried it? Did you ever actually use it?
00:50:41 - Nick Taylor
I did it in that stream I was talking about. I'd have to check, but I think we were working on fixing the Deepgram browser extension, which could actually do live transcription. I think that's what we were doing.
00:51:00 - Anthony Campolo
Like live transcription.
00:51:02 - Nick Taylor
Yeah, I think that's what it was.
00:51:04 - Anthony Campolo
Or — no, we were talking about this before the stream.
00:51:08 - Nick Taylor
Yeah. Okay. And what was the other site you wanted me to load up?
00:51:12 - Anthony Campolo
AssemblyAI. This is the hot new one, the one that has all the money. Just Google AssemblyAI. Whatever you landed on is not their actual website.
00:51:26 - Nick Taylor
Oh, AssemblyAI.com — that makes sense. That's where it's at.
00:51:30 - Anthony Campolo
Yeah.
00:51:32 - Nick Taylor
Okay.
00:51:33 - Anthony Campolo
When you're looking for transcription services, it's like — you know how with vector databases there's Pinecone and then everyone else? Or the same thing in the crypto world: there's Alchemy and then everyone else. This is the thing everyone's using now, allegedly. I kind of like Deepgram a little bit more personally, but AssemblyAI seems to have the most funding and momentum behind it. Take that for what it's worth — try them both out. There was another one I tried out, Speechmatics — it wasn't bad, but it didn't have anywhere near the level of features and documentation that Deepgram had. It was pretty night and day. So I decided to stick with these two, build out those integrations, and then expand out more into the LLM world. I want to support more open-source models, not just Claude and ChatGPT — add Cohere, add Gemini, a whole bunch more models.
00:52:39 - Anthony Campolo
That's probably the direction I'll go. I'm just going to stick with these two transcription services for now because personally, I'm not going to use either of them — I'm going to keep using Whisper.cpp on my own machine. But that's just not feasible for a lot of people.
00:52:54 - Nick Taylor
Yeah. Thinking about you building the actual product with the website and stuff — I guess it wouldn't make sense to use Whisper.cpp there, because it's not like you'd fire it off and then have some background job let them know when it's done.
00:53:17 - Anthony Campolo
There is a way I can spin up a server that just runs Whisper.cpp and use that as my own transcription API endpoint. That's something I'm probably going to pursue and try out. That's one way to have it available without needing it locally, and I can still manage the cost my own way. Like, if I just have a DigitalOcean droplet running it — you just run transcriptions forever. I might end up trying that out and see how it works.
00:53:51 - Nick Taylor
Yeah, this is pretty cool and super useful. I know DevRel's kind of up and down right now, but I know B. Dougie as well — my CEO. I always find it funny to call them that.
00:54:14 - Anthony Campolo
Let's look at the Open Sauced page — you've got a lot of videos you could use this on.
00:54:20 - Nick Taylor
Yeah, I might take it for a spin there. The thing I was going to say is that B. Dougie is all about in DevRel you've got to create content, and one thing to do there is what you're doing here — repurposing content. You did a podcast or a live stream like we're doing now, and it makes sense to convert this into a blog post. I was using Descript for a while — I use it occasionally now. I used it for creating podcast episodes, but my podcast, I haven't pulled in any new episodes in about a year because it's intensive to edit them. I'm almost wondering if I should just put them raw, like it's not like I'm running Syntax FM or something.
00:55:22 - Anthony Campolo
This is why I really like live streaming — like stuff like this. We just go, and at the end there's a huge chunk of content. It is what it is. I got crazy with editing FSJAM. I used to spend like 10 hours editing FSJAM episodes. It was absolutely absurd.
00:55:39 - Nick Taylor
Yeah, that's the thing. Part of the reason why I stream — it's not because I'm lazy. It's just I only have so much time in the week to do something like this, and it's encouraged at work. I would like to do polished YouTube content at some point, but in my schedule right now, it's just not in the cards. Even a five-minute video on YouTube could take forever. Obviously bigger streamers like Primeagen or Theo have their own editors, so it's not them doing it. But I just don't have time right now. That's why I appreciate live streams. I tried streaming to multiple platforms about a year and a half ago and ran into something where I was using Restream and it —
00:56:43 - Nick Taylor
I was streaming with Mike from Ionic. Mike H — what's his last name? It's escaping me. Anyways, I started the stream with him and then something went wrong. When you have a stream key for a live event — yeah, Harrington, thank you. Harrington. But when you have a live stream like this, if something went wrong, I can't restart it because it's live already. I got kind of turned off from the multi-platform stuff from that because it happened a couple of times. But now it seems stable and I think I just have a better setup. I used to edit the YouTube videos before uploading, and now I just don't because I'm streaming to YouTube as well. It's also a bit better quality of life — I'm not editing anything, it's just up there. Getting back to the editing — something like Descript has this neat feature, but it doesn't work well all the time for speech: taking out the ums and ahs. They have an auto-remove feature.
00:57:55 - Nick Taylor
At first I was like, I'll just do that for a podcast episode. It's not bad, but in some cases it'll make blips and stuff, and when somebody's speaking it just sounds unnatural. What I was getting at is — for the transcription part, does something like Deepgram or AssemblyAI get rid of the ums and ahs in the transcript? Because obviously it wouldn't affect the reading, so they're conf—
00:58:27 - Anthony Campolo
They're highly configurable. One of the configurations is to remove filler words. I think both of them have it.
00:58:34 - Nick Taylor
Okay.
00:58:35 - Anthony Campolo
This is one of the reasons why I ended up throwing out Speechmatics. I always forget their name.
00:58:43 - Nick Taylor
Okay.
00:58:43 - Anthony Campolo
There's all this stuff that Deepgram and AssemblyAI offer. They can cut out filler words, you can feed them a bank of words ahead of time so it knows how to spell things it might trip over, you can configure punctuation. There's a whole bunch of stuff. Right now I haven't gone down that rabbit hole yet because for my purpose, you really just need a huge chunk of text to feed to an LLM. Even if there's no punctuation whatsoever, the LLM doesn't care — it's just going to read the thing and extract the raw data of what's happening. It can do that with very little human-readable markup.
00:59:28 - Nick Taylor
Yeah, totally.
00:59:30 - Anthony Campolo
But if I want the transcription to be something nice and readable afterwards, that's where these services give you a lot of power.
00:59:41 - Nick Taylor
You don't need to stare at the screen share right now — I put us back to the talking view. Just to reiterate: you open sourced it, which I think is super cool. This is actually useful. I know sometimes people just create projects just to, you know, do something. I'm terrible at that. A lot of times I'm terrible at coming up with a good idea, so I tend to latch on to an interesting project and go help or contribute to it. I've cloned this obviously because we've been looking at it. I'm definitely going to mess around with it.
01:00:28 - Anthony Campolo
I wanted to show this to my content creator friends to see if they could use it. You were someone specifically I wanted to pitch this project to because I feel like it can be useful — both for your own personal stuff and your work stuff. If you play around with it, let me know and I'll be super curious to see how.
01:00:48 - Nick Taylor
Yeah, when you do packages — if we talk through your ideas again for productizing it. You're saying you can spin up Whisper.cpp, no problem, so you can just basically send it a URL or whatever payload you need to go over there and —
01:01:14 - Anthony Campolo
Right. The simplest would be an input form where someone gives a YouTube video or playlist. They'd need to analyze it to know how long it is, because the length determines the cost — the length translates fairly well to the number of tokens, unless the person was speaking extremely fast. So you'd have different transcription services and models to pick from, each with different costs, and different LLMs with different costs. I'd need a backend calculation that lets you pick these and spits out a cost. That would be single-use. Then the next level is figuring out a subscription that gives people a monthly allocation — and that's where it gets more complicated, because people will be paying for subscriptions and maybe not using all of it, so you can make up margins there.
01:02:26 - Anthony Campolo
But if someone signs up and uses every dollar they pay for every single time, that has to be totally worked out to ensure profits. That's why there's a lot to figure out in terms of the monetization aspect. Single-use videos is probably the first thing I'll implement because I can calculate it and know I'm going to make a profit.
01:02:47 - Nick Taylor
I almost wonder if pay-as-you-go makes more sense. I was looking at AssemblyAI and I think they said it's like 12 cents an hour, for example — that's just for transcription, obviously.
01:03:01 - Anthony Campolo
Yeah, that seems simpler to calculate.
01:03:04 - Nick Taylor
I feel like you potentially complicate your life if you say, okay, it's 15 bucks a month. Oh, you didn't use it this month — I mean, I guess you could do that.
01:03:16 - Anthony Campolo
It would be a lot more complicated.
01:03:17 - Nick Taylor
Yeah. And if you set some fixed rate per month, what if somebody goes over it? One or two people — not a big deal, because the others even it out. But if everybody starts using it like crazy, you're definitely going to lose money. I feel like pay-as-you-go with some margin in there for you to make money would make sense. I've never built a paid app either, but I'm sure there's prior art. Like my buddy Vic — he works over at Twilio or it's called Segment now — he's built a few SaaS products. He could maybe speak to pricing stuff, because you definitely want to get the pricing right, otherwise you just screw yourself over.
01:04:18 - Anthony Campolo
Yeah, definitely. That's why I'm slow-rolling that part. I'm getting a sense of the landscape, the tooling, the stuff I'm building out, the costs associated with it — gathering a lot of data right now, building stuff out, sharing with other people, and trying to get a sense for where to go next. This has all been super useful. What I built for this stream — actually getting the transcription and LLM APIs integrated — that's a really important step. I'm really glad you gave me the impetus to finally ship that part.
01:04:57 - Nick Taylor
Yeah, sometimes it's good pressure — you're like, "I want to show this on stream so I better just do it."
01:05:03 - Anthony Campolo
Yeah, exactly.
01:05:05 - Nick Taylor
I do well with motivation like that too. People talk about not liking pressure, but I think there's definitely positive pressure. There's obviously negative pressure too — if someone's like, "Just get this done, we've got a deadline" — you could word it differently, I guess.
01:05:27 - Anthony Campolo
Yeah.
01:05:28 - Nick Taylor
And if things are always pressing like that, that could be a management issue as well.
01:05:36 - Anthony Campolo
What I like about having streams is that I know I just need to have it built the day before. If I'm trying to build it the day of the stream, I know something has gone terribly wrong.
01:05:53 - Nick Taylor
Yeah. I know you're using Commander in the project. When I was working at Netlify on the Remix adapter, they used a project called Inquirer — you can build interactive CLI stuff there, and I think it's Promise-based. The Netlify CLI uses Inquirer as well. It might have better ergonomics. I haven't really looked at Commander because I'm typically not building CLI stuff. But yeah, there are also other options.
01:06:29 - Anthony Campolo
I have looked at this. This is what's really nice about Inquirer — it's an interactive CLI.
01:06:35 - Nick Taylor
Yeah, that was it.
01:06:36 - Anthony Campolo
Yeah. This is actually one of the next things I was probably going to look at — making the CLI nicer. You can have it be an interactive prompt: "Hey, do you want to give me a YouTube video, playlist, or RSS feed? Which model do you want? Do you want Claude or ChatGPT?" That's the next step. You're 100% on the money. And this is where I'm going to end up in an interesting space where I can work more on the open-source stuff or the paid product stuff. I can make the CLI way nicer, but that's not really going to make me any money. But I am going to enjoy doing it because I use this every single day now. Having a nice interactive CLI prompt — hell yeah, I want that for me.
01:07:32 - Anthony Campolo
So I'm going to build that for me, and that's going to be nice for everyone else.
01:07:36 - Nick Taylor
Yeah. And Nate in the chat is asking, where is it?
01:07:44 - Anthony Campolo
So deep. I'm so deep. Thank you, Nate — I appreciate that. I've been thinking about a lot of this stuff a lot. This project has been very all-consuming for me, in a good way. It's allowed me to go very deep on a lot of stuff and learn a lot in the process. I feel like I have skin in the game in a way I haven't had on previous projects, where I just found a cool framework and contributed to it — which is nice, but it's still not yours. I had the same skin-in-the-game feeling with ajcwebdev.com, which is a super cool Astro site I've gone very deep in. But that's something I can't really share with someone — my website is not useful to anyone else. This is something I can give to other people. Ben Holmes was super into this, he thought it was very cool. Nate, you might actually get some use out of this too.
01:08:43 - Anthony Campolo
I know you create a lot of video and streaming content.
01:08:47 - Nick Taylor
Yeah, Nate streams quite a bit.
01:08:49 - Anthony Campolo
You should try this out as well. And anyone else out there watching — if you think this is cool, you want to learn more, you want to contribute, hit me up. Someone's going to have to convert to TypeScript at some point because I'm not going to.
01:09:03 - Nick Taylor
I can do it. I don't mind. That'll give me something to do.
01:09:07 - Anthony Campolo
All right.
01:09:07 - Nick Taylor
Well, hell yeah. The thing with TypeScript — I know you haven't done a ton of it. I've been doing TypeScript since the early days, since like fall 2015, and like a decent —
01:09:26 - Anthony Campolo
— amount of TypeScript at this point. I just haven't enjoyed it.
01:09:30 - Nick Taylor
The nice thing now is that when it first came out, there was no inference. You had to explicitly type stuff all the time.
01:09:39 - Anthony Campolo
And now, like, imagine.
01:09:42 - Nick Taylor
But nowadays I'm team infer as much as you can, you know? So I would say typically if you're in library code land, you're probably going to have more explicit types, or more complex types, versus your actual application that consumes a library. Like, you'll still have some types, obviously.
01:10:00 - Anthony Campolo
But yeah, I would love to actually plan a stream with you in a month or two where we sit down and figure out how the hell we would type this, because I'm going to need help with that. Yeah, I'm fine to admit it.
01:10:14 - Nick Taylor
Well, the thing is, because I've migrated large code bases to TypeScript before, you basically want to do an incremental approach. Typically what you can do is install TypeScript, obviously. You add a TS configuration and you can set it to allow JS. You remove the strictness. Initially I'm kind of team strict; I think it makes sense. But when you're migrating to TypeScript, it doesn't make sense because if you have it in strict mode and then all of a sudden you just rename all your files to .ts, you're gonna have a terrible time. So by allowing JavaScript as well, you can just incrementally update things. Typically what I do is, in the context of, say, a React application or some front end application, you kind of want to go from the outside in. Because if you go from the inside out, first you're going to say, okay, I converted this thing to TypeScript, and oh, it's importing this, this, and this. And none of those things — you know, there's all kinds of type errors.
01:11:21 - Nick Taylor
So, for example, think of the page of a website as the outside part, and maybe it's got a few components it uses. Convert those components first, and then once all those are converted, then you can do the page. That's probably how I'd approach it. Obviously it's different with — I mean, same concept, you know. Your page in this case would be the main program and then utility functions. I would probably work on the utility functions that I'm using first and stuff. It'd be a fun thing to do because we could probably do it over several streams, or at least a few streams.
01:12:00 - Anthony Campolo
Yeah, I mean, the first thing I would do is I would feed my entire code base to ChatGPT and say, write my types. Write the types. Yeah, start there and then compare that to what you would actually do and see how close it got.
01:12:12 - Nick Taylor
Yeah, well, no, I would definitely use ChatGPT and Copilot and like, even Claude. You introduced me to Claude and it's actually pretty solid for code. Yeah, yeah, Claude. Sorry, I was doing the French naming. Sorry.
01:12:29 - Anthony Campolo
No, Claude. Yeah, Claude's the best. I'm a big fan. I call them Gibby and Claudy is what I call them.
01:12:35 - Nick Taylor
Yeah, that's hilarious. But yeah, I find it super useful for creating types too, or even generating data. For example, I was writing some Storybook stories. Typically I'm doing most of the front end at OpenSauced, along with Zee, my coworker. But I'll just be like, this is the shape I need. Create an array of like 12 items with that shape, and then set the variable to users and give me the array. That's typically how I do code-gen stuff. Oh, take care, B1 Mind. Later, my man. Yeah, but also converting types — you just grab a snippet of JSON and say, generate the type for this for me, because a lot of times it'll convert stuff to string. But sometimes it might know, like, oh, I need to create a union type because these are the only two possible things and stuff.
01:13:40 - Nick Taylor
But I'm definitely using AI in my daily workflow, like, all the time. But yeah, no, this is super cool, my man. And I think the neat thing about this too is obviously you had a problem you wanted to solve, but in your experience, obviously you're a little more comfortable with these things. But how approachable was it to actually build this out? Because I think people get intimidated by AI and, oh no, it's this big scary thing, and I don't really think it is. It's like anything else, you gotta learn it a bit.
01:14:22 - Anthony Campolo
The AI stuff is kind of trivial in a certain sense. Almost all this is just Node scripting. That's really what it came down to: understanding how to use Node to execute commands, to write to the right places, and to pass flags, to give different options. It's like all this is just Node stuff, you know, and then every now and then you hit a command that spits out a whole bunch of text for you, and that's the AI part. But you don't have to build any AI stuff. You're just building a Node project and then integrating it with APIs or building in specific tooling that can run transcription and stuff like that. But all that stuff, that's the simplest part of the project, actually.
01:15:07 - Nick Taylor
And I think this kind of ties into how, at work, we're all moving towards becoming AI engineers. Obviously I'm still doing front end stuff, and there's my other coworker focusing more on infra. But because I listen to the Latent Space podcast a lot from Swyx and I forget the
01:15:28 - Anthony Campolo
other person who does it with him, Alessio.
01:15:31 - Nick Taylor
Thank you. But, you know, there's the distinction — I don't know if this is the best way to frame it — but I think of AI engineers as being the blue-collar workers of AI and machine learning. Because honestly, I don't care about the academics. I've never been an academic. I just want to build things.
01:15:54 - Anthony Campolo
Well, that's what's interesting is that to build a project like this five years ago would have required a PhD. To build it two years ago would have required some hyper-specific knowledge about OpenAI's APIs. To build it today, right now, you have a whole bunch of APIs to pick from and it's just like, which API is going to be the simplest for you to work with? And most of them, it's just like you're throwing text to a thing and getting text back.
01:16:20 - Nick Taylor
Yeah, yeah, yeah. And also, to be clear, no disrespect to blue-collar workers, because honestly my plumber is probably making more money than me. But yeah. And I mean, jobs that will be alive will —
01:16:35 - Anthony Campolo
be electricians and plumbers. They'll have jobs past us.
01:16:39 - Nick Taylor
Yeah, totally. Because, trust me, it doesn't matter what you're doing in tech, if your toilet doesn't flush, you got issues. But yeah, I think — I don't know if that's exactly how Swyx frames it — but yeah, there's still the academics that are really building the large language models, going deep into NLP and the neural networks and stuff. I think it's probably good to have some understanding of that. But I just don't want to get into the academics of it. I want to have a good enough working knowledge that I know what I'm talking about to some degree and I'm building stuff.
01:17:24 - Anthony Campolo
Yeah, it's really about the APIs because the APIs will expose some of the more academic stuff, like temperature. You can adjust the temperature of a model, and most people, if they've just worked with a ChatGPT interface, don't even know what the frick temperature is. They've never been able to configure it before. So the APIs will expose some underlying things to let you mess with some of the more academic things. But you can kind of get into that as you need to start tweaking the output of your LLMs, or you need to increase how much they can take in, how much they put out. You're optimizing for cost. There's all these ways to configure them and to work with them, and that is the AI engineering stuff for sure. It's deep and there's a lot to do there, but you can take that one step at a time. You can start with just like, how do I throw a hunk of text to this thing, get a hunk of text back? Just start with that.
01:18:17 - Anthony Campolo
And that's going to get you really far, actually.
01:18:22 - Nick Taylor
Yeah, no, totally. I say this to people all the time. I know the job market is kind of in the toilet a bit. It's kind of resurfacing, I think. But I still think it's an amazing time to be a web developer, a software developer.
01:18:49 - Anthony Campolo
It's like, I'm having a blast. I'm having the most fun coding I've ever had with all this stuff. Absolutely, 100%. There's a lot of cynicism around it and a lot of skeptics, I think. I think the skepticism is warranted. I think the cynicism isn't, because I think this stuff is exciting and empowering and you just gotta not buy into the hype and understand what they can do and can't do and get your hands on it. But you can build really cool stuff now that you could not build two years ago. And I noticed — I tried to build stuff like this two years ago.
01:19:20 - Nick Taylor
Yeah, no, totally. I'm just dropping links where people can give you a follow again.
01:19:26 - Anthony Campolo
Yeah, check me out. AJC Web Dev on the internet. We can probably start closing it out here because I gotta use the bathroom, actually.
01:19:32 - Nick Taylor
Yeah, no, no, all good. All good. This is super great, man.
01:19:36 - Anthony Campolo
I super enjoy these streams and I would love to do another one in a month or so.
01:19:40 - Nick Taylor
Yeah, yeah, no, definitely. Hit me up.
01:19:41 - Anthony Campolo
We'll.
01:19:42 - Nick Taylor
We'll do it on your stream if you want. Or, I mean, I'm happy to do it here too, but we can hop on yours.
01:19:46 - Anthony Campolo
Yeah.
01:19:46 - Nick Taylor
And yeah, I don't know, I might attempt to put Effect into your project and see.
01:19:54 - Anthony Campolo
I mean, listen, can we start with TypeScript first before we go?
01:19:57 - Nick Taylor
Yeah, yeah, yeah.
01:20:01 - Anthony Campolo
Yeah, no, that'll be fun. I would learn a bunch. And you're the man to guide me. You will be my TypeScript shaman.
01:20:08 - Nick Taylor
Cool. Cool. I'll just say to folks that are still in the chat, I'll probably be live streaming some work tomorrow, but Friday I'm gonna be hanging out with Josh — I'm not sure how you say his last name, Sierra? He's the DevRel over at Laravel, and we're gonna be digging into Laravel. So if you're looking to purchase a Lambo in the next month, I encourage you to join the stream on Friday. And yeah, I don't know, it'll be exciting because I personally use PHP. But the last time I used PHP was PHP 5. They literally just got — I think it was PHP 5 — classes. I'd definitely done WordPress, but the last thing I built was kind of like a single-page app for a doctor in Africa. It was to check for sickle cell disease and stuff. And anyways, it was this Frankenstein of an app I made. It was jQuery Mobile with PHP — not even WordPress or anything, just straight up like that. It was kind of like a single-page app.
01:21:16 - Nick Taylor
But that was kind of my last foray into PHP, aside from occasionally doing some minor WordPress stuff for some people. So I'm excited because I've heard only good things about Laravel. You know, it's batteries included and you can be super productive right away.
01:21:35 - Anthony Campolo
It's like the Redwood for PHP, am I right?
01:21:38 - Nick Taylor
Yeah, yeah, exactly, exactly. Cool. Cool. Yeah. Speaking of which, I'm hanging with Amy Dutton at some point once the RSCs are no longer experimental, I think, because we're still doing some work on there.
01:21:50 - Anthony Campolo
But yeah, I will watch the crap out of that stream. I will be there.
01:21:54 - Nick Taylor
Cool. Awesome. All right, well, take care everybody and Anthony, if you just don't mind staying on for a sec and we'll probably see you tomorrow later.