
AI transcript generator with ajcwebdev
Ben Holmes and Anthony Campolo demo using AI to automatically generate show notes and transcripts from YouTube videos and podcasts
Episode Description
Ben Holmes and Anthony Campolo build an automatic show notes generator using local Whisper transcription and LLMs to create searchable video metadata.
Episode Summary
Ben Holmes and Anthony Campolo explore a workflow for automatically generating show notes, chapters, and summaries from video content using open-source tools. After catching up on freelancing, content creation, and the state of developer live streaming, Anthony walks through his "autogen" project, which chains together yt-dlp for audio extraction, Whisper CPP for local transcription, and an LLM like Claude or ChatGPT for generating structured markdown with front matter, timestamps, and episode descriptions. They run the tool live, first against a sample playlist and then against one of Ben's unreleased videos, watching Whisper accurately transcribe developer-specific terms like "SQLite" and "Vercel" that YouTube's built-in transcription often misses. Ben connects the output to his Astro-based video site, noting that richer transcripts could power deep-linked search into specific moments of his content. The conversation highlights that the entire transcription step is free and local, requiring no API keys or subscriptions, while the LLM summarization step can use either a paid service or an open-source model like Mistral. They close by discussing practical applications including YouTube description generation, accessible captions, and the possibility of feeding transcripts into a vector database for question-answering over a creator's entire content library.
Chapters
00:00:00 - Catching Up on Streaming and Freelancing
Ben and Anthony open with casual conversation about their streaming setups, including Ben's borrowed countdown screen from an old CodeSandbox project. Anthony shares that he recently left Edgio and shifted to freelance content work, writing for EverFund and building applications for Dash, following the path of other developers who moved into independent contracting. He describes the appeal of controlling his own schedule and rebuilding his content creation flywheel.
The discussion turns to the broader state of developer live streaming, with both noting that many creators who started during the 2021 boom have since dropped off. Ben explains that he treats streams as raw material for packaged videos rather than a revenue source on their own, and they agree that the return on investment for high-production Twitch-style streams has declined significantly since the pandemic era.
00:04:31 - Learning Styles and the Limits of Video Tutorials
Ben and Anthony discuss how they each learn new technologies, with Ben admitting he abandons most tutorials after about ten minutes to go build something on his own. Anthony describes how writing documentation alongside building projects was more effective for him than passively following video courses, noting that the Redwood tutorial was a rare exception because it mirrored well-structured docs. They both agree that written documentation tends to age better and is easier to navigate than video content.
This segue sets up the main topic of the stream as Anthony introduces the idea of generating structured written content from video. The conversation naturally bridges from the limitations of video as a learning medium to the value of having searchable, text-based representations of video content, which is exactly what the autogen tool aims to produce.
00:07:15 - Introducing the Autogen Show Notes Pipeline
Anthony explains the high-level concept: starting from a YouTube URL, the pipeline extracts audio, transcribes it locally with Whisper CPP, and combines the transcript with a prompt to generate markdown show notes through an LLM. He contrasts this with paid transcription services and asks Ben about his experience with transcription tools. Ben shares that he uses CapCut and TikTok for short-form transcription because they handle tech terminology better than Descript, but has never attempted longer-form transcription.
Anthony details Whisper CPP's advantages, noting it produces roughly 99% accurate transcripts and runs entirely locally with no API costs. He explains that while it lacks speaker identification and requires some manual cleanup, the transcripts are accurate enough to feed into ChatGPT or Claude for generating high-quality chapters, summaries, and timestamps. The key insight is that even if the raw transcript isn't published, it serves as excellent input for structured metadata generation.
00:10:36 - Running the Tool Live and Exploring the Script
Ben runs the example command against a sample playlist of two short videos, and they watch the pipeline execute in real time: yt-dlp extracts audio, Whisper processes the wav file, and the transcript plus prompt are generated. Anthony walks through the bash script's structure, explaining how it grabs metadata like titles and dates, runs the transcription, and then applies a Node-based transform to clean up the output by merging lines and removing millisecond timestamps to keep token counts manageable.
They examine the prompt template, which instructs the LLM to produce a one-sentence description, a one-paragraph summary, and timestamped chapters. Anthony notes the prompt is customizable — you can request suggested titles, key takeaways, or multi-paragraph summaries. Ben copies the generated output and pastes it into Claude's free tier to test the summarization step, demonstrating the slightly manual but functional workflow of bridging local transcription with cloud-based LLM processing.
00:18:34 - YouTube Transcription vs. Whisper and Practical Applications
Ben questions why one would use Whisper over YouTube's built-in transcription, and Anthony argues for higher quality and flexibility — the same pipeline works for podcasts, MP3s, and unreleased local files that have no YouTube transcript at all. They look at a real-world example on the Dash Incubator Weekly YouTube channel where Anthony has applied the tool, showing how generated timestamps in the description automatically create YouTube chapters.
Anthony also shares a repo where he's built an Astro site populated with transcribed content from Ben's own YouTube channel, including the three-hour Dan Abramov interview. Ben gets excited about the possibilities for making his video content more searchable, imagining a system where transcripts enable deep linking to the exact second a topic is discussed. They briefly discuss the idea of feeding all transcribed content into a vector database for AI-powered Q&A, though Anthony notes he hasn't found a compelling use case for that yet.
00:30:24 - Integrating with Astro and Database Workflows
Ben considers how this pipeline might integrate with his Astro-based video website, exploring whether the output could feed into Astro DB rather than generating markdown files. He talks through a potential db seed workflow where the autogen script would write database entries instead of flat files, making the content queryable with SQL. Anthony explains the tradeoff: the tool currently depends on having the Whisper model running locally, so it's designed as a build-time command rather than a runtime service.
They discuss alternatives like OpenAI's transcription API endpoint and self-hosting Whisper CPP as a server, but agree these add complexity and cost. The conversation highlights the philosophical advantage of the local-first approach — zero ongoing costs and full control — while acknowledging the friction of manual steps that could eventually be smoothed out with API integration or a JavaScript-callable wrapper.
00:37:37 - Developer-Friendly Transcription and Whisper Customization
Ben identifies his killer feature request: a transcription model that understands developer terminology out of the box, noting that YouTube's auto-transcription consistently mangles terms like "SQLite" and "Tailwind CSS." Anthony mentions that Whisper supports prompt-based guidance and parameter tuning for specific vocabulary, though he hasn't explored those features deeply. He notes that proper nouns, especially people's names, are Whisper's biggest weakness.
Ben appreciates that Whisper is a local, cloneable model rather than a subscription API, and Anthony breaks down the cost structure: transcription is completely free locally, while the LLM summarization step can use either a paid subscription, a pay-per-token API, or potentially a local open-source model like Mistral. They compare Claude and ChatGPT, with Anthony noting Claude handles larger context windows while ChatGPT offers more multimodal features, and recommend that most users start with one paid subscription.
00:45:45 - Transcribing Local Files and Live Experimentation
Ben attempts to run Whisper against an unreleased local video file, leading to an entertaining debugging session. After discovering that yt-dlp's audio extraction command can accept local files with the right flags, he successfully converts an MP4 to a wav file and feeds it to Whisper. The live transcription impresses them both — it correctly identifies "SQLite," "Vercel," and "GitHub" as proper terms, validating the model's developer vocabulary accuracy.
They watch the output stream in real time, with Ben narrating his upcoming Cloudflare video's content as it appears in the transcript. Anthony explains the various output formats available including VTT, SRT, JSON, CSV, and even a karaoke mode, noting that several of these are standard caption formats useful for creating accessible subtitled videos outside of YouTube's ecosystem.
01:02:37 - Wrapping Up and Sharing Resources
Ben and Anthony begin landing the stream, with Anthony dropping links to both the GitHub repository and his step-by-step blog post that walks through building the pipeline from scratch. The blog post explains each command and flag individually before assembling them into the full script. Ben expresses genuine enthusiasm for incorporating the tool into his workflow, particularly since the local-first approach means zero cost for the transcription component.
They close with Ben promising to explore the tool further and potentially use it for his upcoming videos, noting that better transcripts mean more accessible, shareable, and searchable content. Anthony signs off and they attempt a Twitch raid to ThePromeagen, closing out the stream after a productive hour of building and experimenting with open-source AI transcription tools.
Transcript
00:00:00 - Ben Holmes
Hello, people. How's everyone doing? I know it's StreamYard. I know we have nameplates, but it's easier than trying to cobble together whatever. Oh my God. What was I using? I was using Ping.gg, and I don't know if that service even still exists. Not taking a chance. Anthony, how are you doing?
00:00:41 - Anthony Campolo
Good. It's been so long since I've heard that 30-second countdown, right?
00:00:46 - Ben Holmes
I was wondering, where's everyone getting these countdown utilities? Oh, I know where. I've had the same one on my stream. It was for Astro 1.0. Someone made a space-age background, and I just went to their CodeSandbox, deleted the word Astro, wrote "Starting Soon," and embedded that website. So as long as his CodeSandbox is still up, I have a starting soon screen. I should, at the very least...
00:01:11 - Anthony Campolo
But nope, you make do. Save your code. You don't want to have code floating out in the world.
00:01:19 - Ben Holmes
Yeah, well, I know you're doing a lot of stuff. Are you getting back in the content game, the streaming game, any of that stuff?
00:01:28 - Anthony Campolo
A little bit. Currently freelancing and doing content for a couple different things. I'm doing general application building and streaming for Dash, which is a cryptocurrency. And then I'm also writing articles for EverFund, Chris's company. They're expanding out and creating a blog and all that. So yeah, actually, I quit Edgio a couple months ago and have just kind of shifted to doing a little bit like people like Jason Lengstorf and James Q. Quick, who either were laid off or kind of just decided to move on from whatever role they were in and do this as a general freelancing-contractor kind of thing. So that's what I'm trying right now, and so far it's fun. I like having more freedom and control over my schedule, my workloads, and all that kind of stuff. So yeah, it's been really interesting. And the reason why I'm doing stuff like this is because now that I'm doing things on my own, I'm going back to that kind of mindset. Before, I had a job where I was creating a lot of content for FSJam and just streaming a lot, because that's kind of what kept me sharp, built relationships, and gave me a lot of good content to chew on for other things.
00:02:50 - Anthony Campolo
So it's like just getting that flywheel going again. You know, you have a really good content flywheel. You've kind of kept up streaming, I think, more than a lot of people who started streaming over the last couple of years. I know around 2021 a whole bunch of people started doing it, and a lot of those people who created whole shows and brands and stuff either fizzled out or are... yeah, you know...
00:03:17 - Ben Holmes
I agree. And I know a lot of it is just that there's not much money in live streaming. I don't do that kind of thing either. I haven't pursued Twitch Partner. I don't have a regular stream schedule. It's just an excuse to hang out with the nerds that are usually around, and we build stuff, and then whatever we build becomes actual videos that I package, get sponsors for, whatever. It's a better flow, I think. But yeah, if you were trying to keep up the production value from Twitch during COVID years and expect the same return now, like, God, it's not there. There's no way. I struggle to find time to watch Twitch streams, really. I'll hang out with cool people, but watching ThePrimeagen and stuff like that? No, I just can't get anything done. And they're all streaming during workdays. I don't know how people do it.
00:04:09 - Anthony Campolo
Yeah, everyone streams during the day. There's a couple, I think. Nicky T's is one that I still tend to watch, and I like the ones that bring on guests and demo new stuff, because it's a good way to keep up with new things. But yeah, there's just so much content in the world, it's impossible to actually watch it all.
00:04:31 - Ben Holmes
There's too much. Yeah, and I'm trying to find a niche there too, because I don't watch the standard walkthrough videos or tutorials. I don't know about you. I think you actually did learn a lot of things through like. Well, you did a boot camp. But I don't know if you also do like the courses and video tutorials and stuff like that because I just.
00:04:51 - Anthony Campolo
Yeah, video tutorials were never super useful for me because it's like, you follow... Actually, I should say the one I actually followed along with was the Redwood video tutorial, and that was one that kind of made sense because they just took the tutorial they'd built out in the docs and filmed it. So I kind of watched that, and since it was a really good docs experience put into video form, that was pretty easy for me to follow. That was one where I kind of reverse-engineered it by watching the videos. But before that, I had MERN courses and stuff like that, and the bootcamp itself was a lot of video content. Even if it's a live teacher, it's still video content at the end of the day. So for me, it was when I actually started writing content based on the things I was building that helped. I found that. And you've talked about how I have legendary readmes, I think is what you said once, because I do say that, right? I would write the tutorials and the docs in tandem with building the thing.
00:06:00 - Anthony Campolo
So it's like, anytime I got to a point where something worked, I had written docs to explain it. And that was a way for me to organize my thoughts. And so I found that by doing that, it helped me actually start building things in a way where, if I was just following along with a video, then you're kind of just copy-pasting stuff and don't really know why.
00:06:22 - Ben Holmes
I know, that's the problem. After like 10 minutes, I just go off on my own adventure and do something completely different, and I abandon the video. Then I feel guilty because every tutorial I have is half-watched, and I'm like, you know what? Just remove it. I don't need that in the history. I got my half. I only needed half before I took off and did something else. I feel like that's fine.
00:06:43 - Anthony Campolo
And videos get stale, and it's not easy to just jump around or skip parts. So docs are really the way to go, I feel like.
00:06:55 - Ben Holmes
Yeah, I get you. And speaking of videos and docs, you sent me this thing, and we're gonna at least try it because I'm very fascinated with anything creator- or YouTube-related. I also called this stream AI-stro, assuming AI and Astro were involved. I don't think that's a good title.
00:07:15 - Anthony Campolo
But I thought that was pretty funny. But yeah, there's a way we can work Astro into this, actually. The high-level description of what's happening here is we're using a couple different tools. You basically start with a YouTube URL, and from that you end up with a generated markdown page with front matter, a transcript, chapters, a summary, and all of that basically spit out in one go. So for me, the reason why I wanted to build this is because I've done so much content that's video content, audio content, podcasts, and video streams like this, and I never paid for a good transcription service. Some people just pay for it and can get all the transcripts they want. What is your experience in terms of actually trying to transcribe content? Have you ever tried to transcribe any of your stuff?
00:08:08 - Ben Holmes
I've done it once and never again. I do transcriptions for my shorts because with short-form, you need the transcript baked into the video for people who don't have their phone audio turned on. But I can only do as much as that. I use TikTok and CapCut, which is a really good transcription service. It's way more accurate than Descript. So I just gave up on Descript and said, nope, TikTok got it. They know how to do tech terms, or for some reason it knows the word Webpack. Like, wow, I can't believe this. So that's what I use. It generates it, I tweak it, and we get the video out. But for anything longer than a minute, no thank you. What's the one I use called? CapCut. That's the full desktop editor. It just takes the TikTok mobile editor and puts it into an online thing. It's free, and it's really good. I actually use CapCut as step two for every video I do because they have stickers and overlays that are more social-media friendly.
00:09:10 - Anthony Campolo
Yeah. Some of the chat is saying probably using OpenAI Whisper, and that's what I'm gonna show here.
00:09:17 - Ben Holmes
Yeah, yeah.
00:09:19 - Anthony Campolo
The transcriptions you get from it are like 99% correct. So if you really wanted to post the transcripts, you do have to do some manual editing, and it doesn't do speaker identification. But for you, you do mostly solo content, so that's actually not as big of an issue. Yeah, but what it's good for is, since it's accurate enough, you can feed it to an LLM like ChatGPT along with a prompt that tells it to generate the show notes. Even if you don't use the transcript itself, you can generate really high-quality chapters, topic headings, timestamps, and all of that. And then you could either clean up the transcript and use it or just throw it out entirely.
00:10:06 - Ben Holmes
Yep. Yeah, I want to try that. Since I already see the parallel. You said markdown front matter. I'm thinking astro content collections. I got my YouTube videos, I can dump all the info in there and I can make this indexable thing that I can turn into a website. So. Yeah, exactly, I want to try that. I capped us at an hour and I know we started a bit late, so like 50 minutes from now we would probably be the end of the stream. So did you want to lead this?
00:10:36 - Anthony Campolo
Well, where did you get to? Were you able to download the model?
00:10:40 - Ben Holmes
Right. I ran the command, the bash and the make and I think it succeeded, but I have no idea.
00:10:46 - Anthony Campolo
Okay, I'll share. You should share your thing, and we should just give it a go. If that's the case, you should be able to just run the example command. The example is a playlist of two videos that are just one minute each, so it'll run through the whole thing pretty quickly, because it takes maybe five minutes per minute of audio. So if you have an hour-long episode, you're gonna be spending multiple minutes per... it's gonna take 5, 10, 15 minutes maybe to actually transcribe. So one thing you can do is basically just set it up overnight. Let your computer run, and it could just transcribe for eight hours straight, because it's just a model running on your computer. You're not paying for anything. There's no server, there's no API key. It's just basically something you run on your machine.
00:11:38 - Ben Holmes
Yep.
00:11:41 - Anthony Campolo
The commands here are you're cloning in Whisper CPP, and that's the C implementation of Whisper, because the original Whisper library is Python and it's ridiculously slow. Then the bash script is going to build the specific models. So this is the large version 2. There's technically a version 3. When I used it a couple months ago, it was all buggy and weird, so that's the one I've been using now. But basically, the bigger models are more accurate, and they're way, way bigger. You can get a smaller model, which will be less than a gigabyte, but it's just crappier. So for me, if you're going to be doing this anyway, you want the highest-quality transcriptions you can get. I just use the largest model.
00:12:25 - Ben Holmes
Okay. So that should mean I have a Whisper CPP inside of this folder.
00:12:32 - Anthony Campolo
Yeah. So you should just pop it open in like your editor.
00:12:38 - Ben Holmes
Yeah. Take a look.
00:12:42 - Anthony Campolo
Yeah. And you want to stay in the root directory. So you don't want to actually open up in Whisper CPP, because there's the autogen repo and then there's Whisper CPP inside of it. There are also scripts that are parallel to the Whisper directory.
00:13:05 - Ben Holmes
Yep. And I think I cloned this alongside the other directory instead of inside. What is it called? One second. Auto...
00:13:16 - Anthony Campolo
Autogen. Yeah.
00:13:18 - Ben Holmes
Okay.
00:13:18 - Anthony Campolo
Yeah. So basically you just take that and plop it into Autogen. Just drag it in. Yeah. Move it.
00:13:24 - Ben Holmes
CPP into.
00:13:27 - Anthony Campolo
So the question in the chat is, what are we building? We're building an automatic show notes generator using Whisper and any kind of LLM you want. I've been using Claude recently, actually, because it seems to give the best results right now.
00:13:47 - Ben Holmes
Yeah. Oh, and you got your content right here. Yeah, yeah.
00:13:50 - Anthony Campolo
There we go. Okay, so go to the readme. Let's just try and run it and see what happens. And run the example right there.
00:13:59 - Ben Holmes
Take a look. If I just ran this, it's this simple. What is this playlist? Incomplete date.
00:14:06 - Anthony Campolo
Oh, don't worry about that. That's good. Yeah, you can suppress that. There's a no-warnings flag you could throw so it doesn't do that. But it also might be struggling because you're streaming at the same time. Hopefully it'll work it out eventually. So while that's stewing, you should... Oh, there it goes.
00:14:30 - Ben Holmes
It did something. We have a lot of information.
00:14:33 - Anthony Campolo
Open the terminal wider so you can see what's happening a little better.
00:14:37 - Ben Holmes
I wasn't even going to try.
00:14:43 - Anthony Campolo
First, it runs a command with yt-dlp to extract audio and get a WAV file that then is fed to Whisper. Right now, Whisper is running a WAV file, and it's creating a transcript. Then that transcript...
00:15:00 - Ben Holmes
Look at it go.
00:15:01 - Anthony Campolo
Yeah. And then that transcript will be concatenated with a prompt, and that's what you feed to ChatGPT or Claude or something. And then it'll also use yt-dlp to get the pieces of metadata, like title, date, images, if it's a cover image. Yeah. So this is the prompt.
00:15:27 - Ben Holmes
Yes, the prompt right here.
00:15:29 - Anthony Campolo
Yeah, I have a couple of different prompts. I created one that will also create like five suggested titles and three key takeaways. And you can have a two-paragraph summary instead, a one-paragraph summary, all this kind of stuff. You can tweak it however you want. You can ask what would be good next topics off of this as well. So this is a fairly bare-bones one. This is just giving you a one-sentence summary, which you can use for the better description, a one-paragraph summary, and then chapters. And the thing I've found this to be most useful for is creating good timestamps and figuring out what the different sections are, if people want to jump to a certain section or topic. This is something that really high-quality podcasts and YouTube streamers and stuff always have. That's one of the first things they do when they put their content out: they have really good timestamps and stuff. So it's not exactly perfect to the minute.
00:16:28 - Anthony Campolo
Sometimes you'll have to kind of move it around so it gets the exact right spot, but it generally captures the shape of what the topics are and gives you a good breakdown. Okay, so it looks like the command worked. So you have got some stuff. Yep. So what we're going to do now is: what do you use for an LLM? Do you have a subscription to ChatGPT or anything like that?
00:16:59 - Ben Holmes
I do, yeah. So are we looking for an API key?
00:17:03 - Anthony Campolo
No, I'm saying you could just... So what I do is I just copy-paste everything below the front matter and then paste it right in ChatGPT and hit go. Yeah, the whole prompt and the transcript. And this is only a minute long, so it's not going to be a lot of text.
00:17:23 - Ben Holmes
Okay, so what did the script that we just ran accomplish? Did it generate these?
00:17:29 - Anthony Campolo
It did the transcription. Yeah, so it was fed a WAV file, which came from the YouTube URLs. So there's two videos here, and it's creating the transcript, and then the transcript plus the prompt is what generates the actual show notes. So there's a step here that I'm going to add at one point where I'm going to hook this up to OpenAI's API and it will do this all for you. But that's something you pay for by the token. Whereas if you already have a subscription to services, you can do as many of these as you want. They cap you at a certain point if you do 20 or 30 in a row, but basically that's how I've been doing it. I just create this and I copy-paste it in, and then it'll give you a markdown output that you just copy-paste, and then I copy it back over the prompt, and that will be the whole thing. This is the little hacky part right here that can be made smoother with API keys and stuff right now.
00:18:28 - Anthony Campolo
But you should just copy paste this whole thing and plug it into ChatGPT and see what it gives you.
00:18:34 - Ben Holmes
All right. Because I know YouTube also generates a transcript. So what's the reason I would want it direct from WAV instead of YouTube? I can think of a couple reasons.
00:18:48 - Anthony Campolo
I think it's just higher quality, in my experience. But you should make sure you're using ChatGPT 4, not 3.5.
00:19:00 - Ben Holmes
I don't have four, I don't think.
00:19:01 - Anthony Campolo
But okay, so you don't have a ChatGPT subscription then. So let's try Claude instead.
00:19:07 - Ben Holmes
Oh yeah. Okay.
00:19:08 - Anthony Campolo
Yeah, let's try their free version. I think their free version will be better than GPT 3.5.
00:19:14 - Ben Holmes
Yeah, I've been meaning to try it for code completions. I know that's where it's strongest. All right. [unclear]
00:19:30 - Anthony Campolo
You set it up this way because then, if you want to do this for podcasts and it's just a bunch of MP3s, that's not going to have a YouTube transcription.
00:19:37 - Ben Holmes
Right, right.
00:19:39 - Anthony Campolo
So this is more flexible for lots of different types of content, not necessarily just YouTube videos. It's kind of optimized for just give a YouTube URL, because most people have content on YouTube, but with just a couple modifications, you could give it a podcast RSS feed instead and it will just run through every single MP3 and do the same flow.
00:19:59 - Ben Holmes
Right. And it could do it because I would like everything prepped before I upload a YouTube video. If I could just feed it in while I'm exporting a video, even, I just feel like it would work totally fine.
00:20:13 - Anthony Campolo
Yeah. And if you use the YouTube API, you can also have a whole write flow as well. So you could just automate all sorts of other pieces too.
00:20:23 - Ben Holmes
Yeah, I'm going through the welcome right now. I'm just speed-running this. Okay, I'm starting to chat with our main man, Claude. So if I just put all this in...
00:20:37 - Anthony Campolo
Yeah, you just plop it in and hit go.
00:20:41 - Ben Holmes
If it says haiku at the bottom.
00:20:43 - Anthony Campolo
Yeah, that's fine. If it's larger than a certain amount, it won't actually copy-paste. It'll just attach a file to it.
00:20:53 - Ben Holmes
Gotcha.
00:20:55 - Anthony Campolo
Cool. So you can just copy this whole thing. Yep.
00:20:59 - Ben Holmes
I kind of want to read it. Let's see here. Well, I don't know the original video. If we could point it at one of my videos, this could be possible because.
00:21:07 - Anthony Campolo
Yeah, the problem is this works better with longer videos. So let me, real quick...
00:21:16 - Ben Holmes
Yeah, I get it.
00:21:19 - Anthony Campolo
Yeah, I already transcribed a bunch of stuff. I'm gonna plop one of these into the conversation. It covers the importance of creating, like, a gist.
00:21:29 - Ben Holmes
Yeah. Because I can share my use case, at the very least, on my current little whiteboard. Yeah, we'll wait for you first instead of derailing.
00:21:41 - Anthony Campolo
No, no, you're good. Continue what you're saying. I'm going to push a repo real quick that you can clone down. So this is where this will connect with the Astro stuff.
00:21:55 - Ben Holmes
Yeah. Because my original website, Whiteboard the Web, was using content collections with markdown files. I've changed it to just be a direct call to the YouTube API, so I can just redeploy it whenever a new video is up. I don't have to have any webhooks for it. I could go back if there's any useful reason to have a local file, because the one thing I have up here is search. It works pretty well for key terms because I have hashtags that match whatever you're trying to do, and it just searches through the hashtags and titles and descriptions. But if it had a more robust description of everything in the video or a full transcript, it would be way more searchable. And I've seen crazy examples where it deep-links into the second where I'm talking about the thing you just searched for. I think that was Tejas. I always say his name wrong, but he demoed something like that. And if we have a transcript, then we can do that kind of thing too. Probably not on this stream, but gears are turning.
00:22:58 - Anthony Campolo
Let me... so clone down this repo. It's an Astro site. I basically took all of the main videos on your current YouTube page and ran this. You only have six or seven videos, because you also have live videos and then you have your VOD account as well, so your content's spread around a whole bunch of different stuff. This is using one of the templates. It's the template I built my own blog off of, so I'm super familiar with it. But just pull it down, install dependencies, and run it, and you can see what it looks like with some of the content actually populated. It has your Dan Abramov interview in there. That was like a three-hour-long thing.
00:23:45 - Ben Holmes
Oh yeah. That will definitely [unclear]. That was something where it's like, this will never have a transcript ever. But now...
00:23:57 - Anthony Campolo
It's really great. If you do get to a point where you just have all of your content transcribed with all these things, then you can feed that to a vector database and actually ask questions about your content. And that's what gets really interesting. I haven't gotten to that point yet with my stuff. I'm not really sure what I would ask it to actually get useful information out of it. Yeah, it will kind of give you back what you talked about, your topics, and stuff like that. So that's nice. But I'm still trying to figure out a use case for that.
00:24:33 - Ben Holmes
Yeah. Because for instructors in areas like web development, it is appealing to have someone who can answer questions instead of you in a Twitter DM, someone that can go off and speak the way you would speak. But it's very risky when AI is early here, so big push and pull. It would need big asterisks all over the place. The Astro docs could benefit from such a thing. We do have an AI bot for that.
00:25:00 - Anthony Campolo
You have an AI. You're one of the first ones to build that, and I'm sure it was absolute trash because people were just figuring out how to do that. But yeah, if you've been working on it since, it should be pretty good by now.
00:25:13 - Ben Holmes
Yeah, I don't know how much we followed up. Oh, look at this cactus site. That's. I like that hover. Yeah.
00:25:20 - Anthony Campolo
Nothing has been changed. I just plopped in the markdown. So yeah, those are what's up on your YouTube right now.
00:25:29 - Ben Holmes
Yeah. Let's click on this 20 minute read. Yeah, 20 minute watch. I like it. Oh, wow. So the minute read is roughly the same as the video length because I remember it being maybe like 18 and.
00:25:43 - Anthony Campolo
And there are things you can do if you want to. Because the front matter includes things like the show's URL, you can also create an embed so you could have the video appear right up top and pull that in from the front matter. And you could use cover images as well, do a similar thing. So once you have the front matter that you get from yt-dlp, that's where you can kind of turn it straight into a website blog page like this.
00:26:16 - Ben Holmes
Right, and what were those properties again? Well, I can just crawl this repo, can't I?
00:26:20 - Anthony Campolo
So go up to... open up the autogen.sh file. I went through a whole bunch of different ways of trying to do this. At one point I tried to do the whole thing with Node, and Node would execute the yt-dlp commands and the Whisper commands, but that didn't really work out super well. So it ended up being mostly a Bash script, with one part where there's a Node script that does some transformations. But what it's grabbing is a whole bunch of stuff, and then it echoes it out with the show link, channel title, publish date, all that stuff. And then it prints what you got from the yt-dlp commands.
00:27:02 - Ben Holmes
Yeah, got it. So we're getting the wall.
00:27:08 - Anthony Campolo
Yeah. And then that runs the transcript and then the transform, because Whisper can create a bunch of different types of files and different formatting. I basically just took the one that was closest to what I wanted the transcripts to actually look like and did a couple transformations on it. It removes the first line, which has Whisper CPP written on it, and then it merges every other line. Because when I was first doing this, the context windows weren't really large enough. So if I had, like, a two-hour-long video, it might crap out. But if I could remove half the lines, then it's not really that much more text. It removes some extra tokens because you have fewer timestamps that way. So this might not be as necessary. I could probably cut this out entirely, just for simplicity. But this is the thing that kind of cleans up the transcript a little bit.
00:28:02 - Ben Holmes
Should have written this in Bash too. Skill issue, man.
00:28:07 - Anthony Campolo
What's funny is most of this was actually using ChatGPT to build it out. So yeah, I was not very good at shell-scripting stuff previously, and I've learned how to do a lot of this through building this project. Actually, this is the first time I built a massive-ass Bash script like this that does a whole bunch of stuff at once. It's kind of fragile. There are a lot of things that could be extracted out to make this simpler. And if you comment out the rm line, then you'll see every file that's created along the way, as it gives you a WAV file, then the transcript file, then the modified one. So yeah, cool. And then this part is what loops over... you give it a playlist instead of just... because in the tutorial, I have a blog post where it first shows you how to do it with just an individual single video. And then it adds on a loop where you can give it a playlist URL, and it basically prints a file with every URL for each video and then runs each of those through the whole autogen function.
00:29:14 - Ben Holmes
Gotcha. We have this block up here. This is process URL. That makes sense. You just loop over every video in a playlist.
00:29:26 - Anthony Campolo
You created the urls.md file in your content. If you click that, you'll see there's just two URLs written to it.
00:29:34 - Ben Holmes
Nice. That's your output if you want to embed YouTube videos as well. We do have this
00:29:45 - Anthony Campolo
showLink is in there.
00:29:47 - Ben Holmes
I guess URLs isn't super necessary.
00:29:51 - Anthony Campolo
That could also be. You could add that to the RM command and delete that at the end as well. Right now it just stays there.
00:29:58 - Ben Holmes
I see. Gotcha. So if we go back over here, if I wanted to run this as part of my build, I could run autogen. I guess I could put my content directory into .gitignore, so it just kind of generates this directory without it getting too noisy. Actually, no, I can't do that at all. It definitely needs to be...
00:30:24 - Anthony Campolo
That's the way it's set up in my repo. There's just a .gitkeep file in there, so the content directory stays there, but it actually doesn't commit any of that. So yeah, there's a couple steps where you'll have content that you need to get into your Astro project somehow. Those could be combined in some way at some point. But yeah.
00:30:52 - Ben Holmes
Yeah, because there are definitely use cases for this stuff. I don't know if I would use Astro DB as the source for this as well, since I don't really need a markdown file with the whole transcript. To me, this is just side stuff. I don't really need it published anywhere. Some people do. But I would kind of just need all the metadata that gets scraped out, an episode summary, and some other stuff. It doesn't really make a difference, though, because I can show you what I would end up with if I went the Astro DB route. I would probably have db seed in here, and this is just export default async, boy. And this would call autogen, ideally, and it would write some database entries. I feel like that could be possible, but that would be like being able to call your script from JavaScript, and it's intended to be more of a build-time command. Am I right about that?
00:31:58 - Anthony Campolo
Yeah. Because it depends on you having the transcription model on your own computer. So there are ways around that. If you want to use OpenAI's transcription endpoint instead, they used to have a setup where it would only give it 15 minutes of audio at a time, though. So you had to chunk your thing up, split it... I was like, no, it's gonna be so much more work. So I'm not sure if that's still the case. And you can actually take Whisper CPP and deploy it as a server also, so you could host your own transcription endpoint, but then you're obviously managing a server, and that's not necessarily simpler. So right now it's set up where you just run it on your own computer, and it kind of just handles this business. And there are a couple different ways you can go out from that if you want to make it more user-friendly for people who don't necessarily know how to use CLIs and stuff like that.
00:32:56 - Ben Holmes
Yeah, and to clarify, I wasn't saying I want a runtime version of this. I guess I meant just a JavaScript-callable version. I could easily call exec and do the same thing because seed is run through the Astro CLI, so you also have access to the full Astro DB suite. Instead of this writing to files, it would return all of the information as JavaScript objects, and I could use those to write to a database or write my own files or something like that. It doesn't have to be a markdown
00:33:32 - Anthony Campolo
output for me to use, right? Yeah. And you could use the YouTube downloader to just get a JSON file that gives you all the metadata at once, and you can just dump that whole thing in your database. They just have it all, and it's there. And then you can parse that data out however you want.
00:33:53 - Ben Holmes
Yeah, I mean, we're just dealing with data right now. We can massage this however we like because we're literally staring at the Source code. I'm not reading your docs and seeing a feature is missing. Like all the power is here at the moment to do something weird.
00:34:10 - Anthony Campolo
Yeah, totally.
00:34:12 - Ben Holmes
So is anyone else using this yet, or have you found a killer app? What is the killer app for this thing?
00:34:19 - Anthony Campolo
Oh, so I am using it, actually, with Dash because they do a weekly stream called Incubator Weekly. For that, it's usually like an hour-long show, and if you look at previous episodes, they just have a one-sentence description. And I actually found a specific Reddit message where someone was like, could you give a summary of this so I know whether it's worth watching or not? Like, what's it about? And I'm like, this is the perfect thing. You just want a general boilerplate description of what is actually in the content. Yeah. So click the most recent one. Yep. And then open up the description toggle. You'll see there's the summary, then the chapter headings. And so that is exactly what we were seeing created for your stuff. I don't use the full transcription. I'm just using the timestamps and that summary, because YouTube is very smart. It actually uses those to make the chapters instead of auto-creating the chapters. The auto transcripts are probably pretty good, but the chapters, I'm never really sure how good those are.
00:35:30 - Anthony Campolo
I haven't really tried too much.
00:35:33 - Ben Holmes
I didn't know they would generate chapters for you. And you're very complimentary of this. I think this is the scrappiest solution ever. I thought there would be a different way, but I was like, oh, you just dump it in the description. Okay, I guess I'll do that.
00:35:48 - Anthony Campolo
Yeah, if you're just doing a video a week, the couple manual steps are not a big issue. If you're going to be doing lots and lots, if you're a channel that puts out like eight videos a day that are shorter videos, then it might make more sense to build something in. I'm not sure if yt-dlp can also write if you give it permissions or not. I've just been using it to read data, but I have played with the YouTube API. What's up, Nicky? And yeah, it's just a little bit too complicated for me where it's not really worth it, because I hate having to deal with registering a whole API setup with a key and then a Google account. It's just a whole thing.
00:36:37 - Ben Holmes
I know. Yeah. So I've been talking about places I would use it in theory from an engineering perspective. I'm glad this exists, but in practice, the stuff that I would want for the masses... The main thing for me is just transcripts that understand developers, because YouTube is a very general model that gives you transcription, but it doesn't know all of the common developer terms. So you have to go through and fix "SQLite," which it thinks is like sequel the movie and then light, like light Coke or Diet Coke. It's just a problem. If this was tweakable to say, hey, I'm a developer, our buzz terms are these words, it would be great if it said SQLite every time and Tailwind CSS is one thing, that kind of thing. If that was part of this, then oh man, a transcription that understands me instead of just a general one that I hope works... that would be game-changing. I would actually put transcripts on my videos if that existed.
00:37:37 - Anthony Campolo
Yeah, I know that there are ways with Whisper where you can kind of pre-feed it a prompt. So you can possibly do that with some of your content that will include all those kinds of terms and whatnot. I found that usually it's pretty good just out of the box. The thing that it actually struggles with more than anything is people's names. There are times when, instead of Ben Holmes, it'll be like Brent something, like Brent Ohms or something like that. It doesn't quite know that. So yeah, that's some of the stuff where there are kind of ways to low-level manipulate Whisper CPP for stuff like that. The library can do a ton of stuff, and you can also modify its parameters to be better for predicting certain terms and whatnot. But I haven't really messed around with that too much.
00:38:40 - Ben Holmes
That's probably something that I need to play with on my own to at least know what Whisper can do. I like that it's this local model that I can pull in because I'm always skeptical of things that are just API endpoints and monthly subscriptions. If it's small enough that I can clone it, which it took a second, but if it's small enough to do that, that's kind of a win. Is that kind of your dream scenario? If you wanted content creators to use this tool, they are going to have a local model.
00:39:08 - Anthony Campolo
Yeah, I mean, that's the reason why I've set it this way: just for myself, because I can use it as much as I want. There's no cost incurred at any point unless you're paying for whatever you're using to actually generate the show notes with the prompt. And we used Claude's free version, and there are other levels we could go to, or we could also include steps to run a Mistral open-source model, and then you can even feed that and run that locally like you would Whisper. So that's something I'll probably explore soon, because the open-source models are getting pretty good and comparable to what you get from the free stuff. The problem is the paid stuff, like paid Claude and paid ChatGPT, are still way better than anything else you're going to get. The best way to do this setup is to actually just have a subscription and then feed those in directly. Otherwise you need to use their API, and then you pay by the token.
00:40:13 - Ben Holmes
Makes sense. Yeah, I'll actually subscribe to one of these. It's clear that I don't follow up on AI stuff, but I know between the two. Do you have opinions on Claude versus GPT? I know like blanket strengths between the two.
00:40:32 - Anthony Campolo
Yeah, I mean, right now ChatGPT is more multimodal. It can generate images, and Claude now can understand images. You can feed it images, I think. And then Claude can't search out to the internet, so ChatGPT can do web searches. If you want to ask questions about up-to-date things, that's good. Claude, though, can take larger inputs, so you can feed it really, really long transcripts, like three-hour-long transcripts, whereas ChatGPT will kind of crap out after like an hour-and-a-half-long transcript, in my experience. So there are pros and cons to both. I think, in general, if you just want to pay for one, I would recommend ChatGPT. Even though Claude has some advantages, they're going to be close enough, and ChatGPT is going to include a whole bunch of other stuff, and more people are going to be familiar with it. But if you're someone who uses these tools a lot and is really interested in staying up with the cutting-edge stuff, I subscribe to both of them, and they're both 20 bucks, and they save me so much time.
00:41:39 - Anthony Campolo
It's absolutely worth it. Yeah, really, when you consider how much we pay for things like domains, you pay 10x that for a domain for a project you're not even using. So it really seems quite silly not to have at least one of them.
00:41:54 - Ben Holmes
Yep. They're all lost leaders. There's no way they're not.
00:41:58 - Anthony Campolo
Yeah. Nick is asking about the bigger token limit for Claude. So there's something called the context window. A year ago, when these things were first coming out, it was like a couple thousand words. You could give it maybe a long blog post, and that's about all it could handle. Now it's up to hundreds of thousands of words, and it's getting so ridiculous that Gemini also, I think, can go up to like a million or so. You could feed Gemini an entire book, and it can kind of take it in. At a certain point, though, it kind of loses resolution as you give it larger and larger things. So I find that if you do have a three-hour-long video, if you feed it an hour at a time and ask it to create chapters, it'll create better chapters. If you can't give it all at once, or if you give it all at once and then ask it to go over it again, like redo it... so at first it'll give you like five chapters, and each will be like a half hour long.
00:43:02 - Anthony Campolo
You'd be like, redo this, make twice as many chapters, and then it will create 10 chapters that are like 15 minutes long. So you can kind of work with it. And yeah, with all these things, you just kind of have to experiment with them and see where the sweet spots are.
00:43:16 - Ben Holmes
Right. You have to find the sweet spot of how many times do you have to redo something? It's just like doing a second coat of paint over whatever text you're trying to get out. It's weird. Just asking, are you sure is enough to get better chapters?
00:43:33 - Anthony Campolo
Yeah. And one little trick I do is, I haven't automated this part, but there's a way you can grab the actual length with the YouTube downloader. And so if you paste that where it says note the last timestamp, and say it'll be like an hour and 17 minutes, whatever random amount it is, say the chapters have to go to that, then it will know to create chapters all the way to the end. Because one thing it does is, if you give it a really long episode, sometimes if you give it just two hours, it'll create chapters up to like an hour and then it'll just be like, that's the end. You're like, that's not the end. There's another hour of video.
00:44:11 - Ben Holmes
Literally me, though. That's how I do every edit. Just like, ah, that doesn't need a background. Yeah, that's fine. Yeah. Well, this is all really cool. At least for short-term engineering, my goal is to have something more searchable here than just doing stuff based on hashtags. So that's a reason to try it. Also deep-linking, because this search is really simple. It is just going over the entire dump of all my front matter ever and finding string concatenation. But if this was in a database, searchable with SQL using the LIKE operator, and had deep, granular transcriptions so you knew which part belonged to which, then for...
00:44:59 - Anthony Campolo
These minute-long ones, you wouldn't create chapters, but you would just have a transcript, a one-sentence summary, a one-paragraph summary, and then you could have a searchable summary of every single episode along with the transcripts.
00:45:14 - Ben Holmes
That's what I'm thinking. Yeah. But I guess, yeah, if I'm going for searchability that deep links to the exact second something happened, I would probably just want to dump the transcript. I don't even know if I need the summary. The summary will be good for the YouTube description. So I have something here. I hate writing these because no one looks at the YouTube short description. Most people don't know there is one. So if that was just done, that would be. That'd be sick. Yeah.
00:45:41 - Anthony Campolo
So that's, that's the kind of stuff it's, it's good for.
00:45:45 - Ben Holmes
Yeah. And also, well, I know the answer. I can run this against my local videos before I upload, right?
00:45:53 - Anthony Campolo
So.
00:45:54 - Ben Holmes
Or I could do.
00:45:54 - Anthony Campolo
Yeah, so you could run it against basically... the way the YouTube downloader is set up, that command right there where it extracts audio, audio format, that one, you can feed it all sorts of stuff. You don't necessarily feed it a URL. You can feed it a link to a file. It doesn't need to necessarily be a video. You can feed it just an MP3, it'll turn MP3 into WAV, it'll turn MP4 into WAV, it'll do whatever it needs to do. So really, that CLI tool enables so much of this entire workflow. It's really pretty insane.
00:46:30 - Ben Holmes
Gotcha. So just out of curiosity, I'm going to drag a video in here and I'm just gonna see what it does. I'm gonna. I'm trying to figure out how to hide what I'm doing because I hate opening the finder. And then everyone sees all my crappy downloads. There we go.
00:46:47 - Anthony Campolo
We'll move it over here.
00:46:48 - Ben Holmes
I have a whole bunch of videos. Which one do I want to do?
00:46:55 - Anthony Campolo
Luke asked where they can find the repo, and I've plopped a link in there for you. Yeah. The instructions to install yt-dlp and ffmpeg just have brew commands and command-line utilities. And then you'll have to download the Whisper repo, build the model, and then run the make command. And then you should be able to do everything with it. Right now it just takes playlist URLs. If you follow the tutorial, it will get you to a point where you can do just videos. I need to create two separate Bash files. If you want to just do a single video, you can do this, but the way I built it...
00:47:42 - Ben Holmes
Yeah, exactly.
00:47:43 - Anthony Campolo
It shows you how to build on it. Then once you get to the end, it's like, here's how to do it with the playlist. It's like you have lost the ability to just do it with the URL. Hey, it's Fuzzy Bear.
00:47:52 - Ben Holmes
Hey, fuzzy bear. I am obviously really confident and I know everything about bash, so this is going to work. No problem. We're going to say process URL right here. We're calling the single gen. The name is the single gen. We're going to ignore all of this. I don't care. I am just trying to get the bare minimum info out of this thing. So we will. We'll see what happens. Instead of doing this, because I already have a file and this is from a very, very exciting upcoming video about Cloudflare.
00:48:33 - Anthony Campolo
The file is not going to be set up for Whisper. It needs to be WAV, and it needs to be a specific bitrate or sample rate or something really specific.
00:48:43 - Ben Holmes
I was going to say I can change the format.
00:48:47 - Anthony Campolo
I was saying you should bring back the last command that you just deleted.
00:48:53 - Ben Holmes
But it's not uploaded. This is unreleased content.
00:48:56 - Anthony Campolo
You can get rid of the echo stuff and all that. That's fine. But you want to leave the yt-dlp extract-audio command.
00:49:05 - Ben Holmes
Ah, I just wanted to work with local files.
00:49:07 - Anthony Campolo
No, this does work with local files. I'm saying you run the local file through that command, because that is the command that will turn it into a WAV file that Whisper can work with. Because Whisper can't work with an MP3, just because of the way it's set up.
00:49:22 - Ben Holmes
I totally get you. I didn't think I could just feed a local file into this, but. Great.
00:49:28 - Anthony Campolo
This might not be the exact correct syntax. So we're going to find out what's going to happen. We'll probably check the.
00:49:33 - Ben Holmes
I could probably just run this myself, couldn't I? Let's find out.
00:49:38 - Anthony Campolo
I mean, actually... Oh no, I know how I do this. You wanted to choose ffmpeg by itself. You just run ffmpeg on it, and then that will set it up for... hold on. I think I actually used to have this in the tutorial.
00:49:57 - Ben Holmes
Let's see.
00:50:00 - Anthony Campolo
Yeah, I think I ended up taking it out.
00:50:03 - Ben Holmes
Did it actually... Oh my God. Okay, so it knows about WAVs.
00:50:07 - Anthony Campolo
Okay, there you go. So that file is what you'll feed into Whisper now?
00:50:12 - Ben Holmes
Yep. Oh, this is... this is unedited audio. Let me find stuff that's...
00:50:17 - Anthony Campolo
Actually,
00:50:20 - Ben Holmes
I'm like. Am I cursing myself out for saying a line wrong? Maybe.
00:50:24 - Anthony Campolo
Yeah, it will generate the words of whatever it is you filmed.
00:50:29 - Ben Holmes
Oh, yes, it will.
00:50:30 - Anthony Campolo
There's no moderation like ChatGPT. It'll transcribe anything.
00:50:36 - Ben Holmes
Okay, well, I have the full video, but that might take too much time. How long do you think, like a. A 13 minute video would take?
00:50:47 - Anthony Campolo
A couple minutes at least.
00:50:49 - Ben Holmes
Okay, that seems okay, actually. So if I just had an MP4, which I do here... this is the perfect way to tease a new video. You can't watch it. You can just see the transcript. But if I want it... so we don't need to change the form. Oh yeah, we do. I would think I would feed this to yt-dlp, though, because I have an MP4 and I'm just gonna do that. Well, not a WAV. I'm gonna do something else. We'll do that. Oh God. Did you see that?
00:51:31 - Anthony Campolo
You can't.
00:51:33 - Ben Holmes
That was strange. I wonder. I don't know what to say about that. Args. Typing this out to at least understand it myself. So this will be.
00:51:49 - Anthony Campolo
Yeah, the output template is how you give it a unique name based on the ID of the video. Yeah, otherwise you could just hard code the name in like that.
00:52:01 - Ben Holmes
Totally. So this is located in scripts, slash fcp. Nice. Let's see what it thinks. Nope.
00:52:11 - Anthony Campolo
Yeah.
00:52:11 - Ben Holmes
Set default search. Oh, okay. Okay.
00:52:15 - Anthony Campolo
Quotes around... what do you... yeah, put quotes around the local file.
00:52:21 - Ben Holmes
Yeah, we'll start there.
00:52:26 - Anthony Campolo
Nah. Yes, right now it's trying to take... we want to just run ffmpeg on it.
00:52:37 - Ben Holmes
One last thing. Because it's using URL formatting, my web-dev brain tells me I could put file: and it would work, but I have no idea. Nah, I was hopeful. Oh, interesting. File URLs are disabled for security reasons. Oh, we're getting somewhere, though.
00:53:02 - Anthony Campolo
Hold on. I can find the exact command we need to do with ffmpeg. I just have to go through my git history real quick.
00:53:13 - Ben Holmes
Yep. I of course, just coming up with things that we are not prepared to do.
00:53:18 - Anthony Campolo
But I do use this, and then I got rid of the instructions.
00:53:27 - Ben Holmes
I could get the absolute path. So if I went to repositories autogen scripts, I have a feeling it has to be the... yeah, it has to be the absolute URL. So y'all are gonna see that info, but that's fine. Ah, dang it.
00:53:53 - Anthony Campolo
Okay, so here's.
00:53:54 - Ben Holmes
Oh, I think we did it. We shoved it through.
00:54:06 - Anthony Campolo
Okay. For those following home, here's basically the kind of command you would want to do.
00:54:14 - Ben Holmes
Well, I bullied my way into a solution with yt-dlp.
00:54:23 - Anthony Campolo
So go back to the command you ran. I didn't see it.
00:54:27 - Ben Holmes
Yeah, I ran extract audio, all of that. And then I said, default search file, enable file URL. And as long as it's an absolute path, it works.
00:54:36 - Anthony Campolo
Okay, that's funny.
00:54:37 - Ben Holmes
Totally fine.
00:54:38 - Anthony Campolo
Yeah, it probably says not to.
00:54:39 - Ben Holmes
You should just use ffmpeg. Yeah, that was fun.
00:54:43 - Anthony Campolo
Exactly. Yeah. Okay, so I found that there's a lot of things you could hack around with this tool. It has a whole section of, like, commands you can do where it's like, you really shouldn't do this. You should probably do this. All these different ways to do stuff.
00:55:01 - Ben Holmes
Yep. At this point, why am I even tweaking the Bash script? Let's just run things. If we have a WAV, we can run it against Whisper, and that's it. What is the of flag versus the f flag?
00:55:17 - Anthony Campolo
That's what it outputs. It takes in a file and then it outputs a file, because it outputs an LRC file. So it takes the ID and then it turns that into the ID LRC, which is the transcript file that it outputs. And there are... you could do a JSON file, you could do a VTT file, you could do just a text file. There's a whole bunch. You can do CSVs if you want to turn it into a spreadsheet.
00:55:53 - Ben Holmes
A spreadsheet. That's awesome. Output alert. Let's see it.
00:56:00 - Anthony Campolo
Yeah, it looks like it's doing it.
00:56:03 - Ben Holmes
Yep. So we'll let it do its thing. We're on an M1 processor trying to stream video to the internet. I'm sure it'll be fast, but we can watch it. Oh, look at that. Sizzling and slurping sounds. That's actually SpongeBob's boots stepping up to the stage. Yeah.
00:56:18 - Anthony Campolo
Why is nobody talking about Cloudflare?
00:56:22 - Ben Holmes
That's what it is. The easy button to deploy any Trello clone with zero users on the internet is nailing this. So far, it's capitalizing Vercel as a proper noun. GitHub, proper noun. Absolutely killer. Why? Okay, it's supposed to be USPS, the mailing company, but it pretty much nailed that, even though I cut off at the end. It's probably because of the cutoff. "I just want the cloud with no headaches." I think of Vercel like Trader Joe's. You're repackaging other goods with a fancy brand.
00:56:54 - Anthony Campolo
Yeah. I thought you tweeted that comparing to Trader Joe's.
00:56:58 - Ben Holmes
As I was working, I was like, I'm gonna tweet things to see how hard they resonate by just not saying anything else. I think it resonated okay, but not great.
00:57:08 - Anthony Campolo
A very specific type of person knows
00:57:11 - Ben Holmes
even what Trader Joe's is. That's sadly true. We'll see how it gets recognized in the video. Someone said, yeah, this is gonna be a problem for the Indian viewers. And yes, yes, it is. But we're gonna let it ride, because for the right people it hits. I like that kind of humor where it's selective, but it hits hard for the people who know it's true. Okay, so it's working, though. That's so awesome. And I like the live feed to know exactly what's going on.
00:57:43 - Anthony Campolo
Yeah. So if you don't give it the output LRC flag, it will just print it to the terminal and not do, like anything with it. And there's like seven different output flags. You can go for the different file types.
00:57:56 - Ben Holmes
Gotcha. This is great, though. If anything, I'm going to take this Whisper CPP and just run it on things, because I can see this just getting uploaded to YouTube as is.
00:58:08 - Anthony Campolo
Honestly, I'll do this now with certain channels that I follow, or new channels, or developer channels. I’m like, I don't really want to watch this 10-minute video. Give me a really quick summary of what it actually is about. So sometimes I'll just run this through random stuff I would have watched for fun. It's interesting to see what it comes up with.
00:58:30 - Ben Holmes
Yep. Oh, it got SQLite as a word. YouTube would never get that. Absolutely never. That is awesome. I thought I'd have to train that. Look at that go.
00:58:44 - Anthony Campolo
Yeah, I need to try the version 3 model again also, because if that one is working well now, then it might be even more accurate. Who knows.
00:58:54 - Ben Holmes
Yeah, I should have asked, where did you pull the model for Whisper CPP? Is that something clonable? Well, I guess we did just clone it ourselves.
00:59:03 - Anthony Campolo
Didn't we, though? I'm giving instructions to Luke. You're just cloning the repo, so you can just go... look, it's just a GitHub repo.
00:59:11 - Ben Holmes
Yeah, I can just.
00:59:12 - Anthony Campolo
And then the bash command that downloads the model is using, I think, just Hugging Face, like places that host giant models that you can download.
00:59:24 - Ben Holmes
That is so sick. Yeah. Offline. Why not?
00:59:28 - Anthony Campolo
Yeah. So if you want to take what I've built and modify it, you'll spend a lot of time in the Whisper CPP and yt-dlp README. They both just have a giant README with all this information, and yeah, there's a lot of stuff they can do.
00:59:45 - Ben Holmes
Yeah. The most complex thing I've done with DLP is download sections, because sometimes I need a specific sound effect and I'm watching this three-hour video, or maybe it's a song loop, and I just find this one sliver. And you can just say download sections, timestamp, timestamp, and it downloads exactly that part of the video. It's so good, because I thought I'd have to download the three hours, but you don't. It works totally fine. You could probably do that for live streams as well if you just need one part of the video.
01:00:16 - Anthony Campolo
Or if you wanted to like. Or if you want to cut out like the first five minutes.
01:00:20 - Ben Holmes
Right.
01:00:20 - Anthony Campolo
Like intro music where there's no words to transcribe.
01:00:24 - Ben Holmes
Oh, that too. Yeah. You could condense it to the part that needs transcription. Yeah, you should probably just cut out your intro music from your video, though. You should probably do that. I don't, but you should probably do that. All right, well, I'm gonna take this once it's done and put it into YouTube.
01:00:42 - Anthony Campolo
Yeah.
01:00:43 - Ben Holmes
Does this go into YouTube? It's not done yet. It's supposed to be done at the
01:00:49 - Anthony Campolo
14-minute mark, but then after that goes through, I would recommend running the transform script on it also, because what that does is it removes the milliseconds, I think. It gives you minutes and seconds, but then it also gives you milliseconds, which I'm not sure will create correct timestamps on YouTube. I don't think it would. It might, though. So part of the transformation it does makes it a little friendlier to be auto-linked in YouTube if you just copy-pasted it.
01:01:29 - Ben Holmes
Got it. And this is the magic flag that's giving us this format, output LRC.
01:01:35 - Anthony Campolo
Yeah, yeah, yeah.
01:01:36 - Ben Holmes
Okay. So there's probably a bunch of formats that you can get out of something if you wanted it.
01:01:42 - Anthony Campolo
Yeah, the Whisper docs give you all the different outputs. You're scrolling right past it. So right there you have output text, VTT, SRT, LRC, JSON, CSV. That's all.
01:01:58 - Ben Holmes
Got it. Karaoke video. Oh man.
01:02:03 - Anthony Campolo
Yeah, so that's where it lights up the words as the time is going by, so you can follow along with it.
01:02:11 - Ben Holmes
So yeah, I love that because some
01:02:15 - Anthony Campolo
of these are actually caption formats also. So if you want to sync this with a video, if you want to create actual captions, that's a thing it does as well. So I'm not really doing that right now. But that's another thing: if you actually want to use this to create accessible subtitled videos and you don't want to just use YouTube's auto-generated things, or if you have a video that's not on YouTube and you want to do this, it can be used for that.
01:02:37 - Ben Holmes
Right, sweet. All right, well, I will go off on my own adventure learning about that. But as we're trying to sort of land the plane here, is there anything that I should share out to the chat for people to go look at, just to see the stuff you're working on or to try this project themselves?
01:02:59 - Anthony Campolo
Yeah, I mean the GitHub repo and the blog post, which I've shared both of them already, but I'll drop again.
01:03:06 - Ben Holmes
Yeah, just drop it in the chat and I can highlight it. There was a question, not a real question: did it say SQLite? No, it said SQLite, the real thing. My dream. Yeah, but it auto-generates show notes, and then Autogen, which is the raw thing. So Show Notes... I don't even think we looked at that repo. Is that... this is the blog post.
01:03:29 - Anthony Campolo
So this takes you step by step. It's like, here's the yt-dlp command to get... okay, so it just explains that command. It's like, here's the Whisper command to transcribe. It explains that, explains each flag, and walks through this, and it builds up the script piece by piece. And then it explains it in a way that's hopefully comprehensible.
01:03:53 - Ben Holmes
Yeah. Because you were my docs today and I was thinking, where is the Anthony Step by Step guide?
01:03:59 - Anthony Campolo
Here it is.
01:03:59 - Ben Holmes
Is it [unclear]? Yeah, I thought that was GitHub.com, but that makes more sense. We got the full thing. All right, well, you've piqued my interest. I'm actually going to do something with AI, especially since it's local. I don't have to pay anything, so mission accomplished.
01:04:14 - Anthony Campolo
That's pretty much what I was going for. So yeah, we'll see what you do with it. I'll be curious.
01:04:21 - Ben Holmes
Yeah, yeah. Y'all will see more of this, and also tune in for the video, which will now have a really good transcript. So: accessible, shareable, searchable. Thank you for that. Yeah.
01:04:35 - Anthony Campolo
Oh, man. Well, we're way over time, so I think you gotta get out here.
01:04:40 - Ben Holmes
Yep. Well, we started late, so I thought we should run the full hour, but this was freaking awesome. Always welcome on the stream, obviously, and people can find you, AJC Web Dev, everywhere, Twitter, GitHub, etc., and I'm Whiteboard Guy, if you just search that. I'm sure the SEO is terrible, but you can try. Okay, let me... I don't know how to raid people on... I don't think you can do it on StreamYard. So I'll just do it manually somewhere. Yeah, yeah, we can raid ThePrimeagen. That's easy enough. I bet if I go to Beyond Dev Chat, I can do that. Slash raid. Nice. Oh no, my keyboard's locked again. I think there's something in my keyboard. There's some ghost. ThePrimeagen. There we go. Well, yeah, thanks again for coming on, dude. I'll see you around the interwebs. Yeah, we'll keep it up.
01:05:50 - Anthony Campolo
Bye, everybody.
01:05:51 - Ben Holmes
All right, bye, everyone. I'm actually afraid to end stream until I know the raid went through. This is so complicated. Raid now. There we go. Nice. End stream.