Video

AutoShow: Automated Content Repurposing Tool

November 6, 2024

Anthony Campolo demos AutoShow, an open-source AI tool that generates transcripts, summaries, and chapter notes from YouTube videos and podcasts.

Episode Description

Anthony Campolo demos AutoShow, an open-source AI tool that generates transcripts, summaries, and chapter notes from YouTube videos and podcasts.

Episode Summary

Anthony Campolo joins Nick Taylor to walk through AutoShow, an open-source CLI tool he's been building for roughly nine months that automates the creation of show notes, summaries, and timestamped chapters for video and audio content. The conversation traces how the project grew out of Anthony's frustration with the manual work of producing podcast show notes — listening back to episodes, identifying chapter breaks, writing descriptions — and how he realized LLMs could handle most of that if given a timestamped transcript. The tool's pipeline downloads a YouTube video via yt-dlp, strips the audio to WAV, transcribes it locally with Whisper CPP, and optionally feeds the result to an LLM (Ollama for fully local processing, or services like Claude and ChatGPT via API) along with a detailed prompt that produces structured markdown output. They demo the full flow live, running it first with Ollama's small Llama 3.2 model and then pasting the prompt into Claude's chat interface for a higher-quality result. Anthony explains his plans to productize AutoShow with a credits-based pricing model inspired by Photo AI, targeting non-technical users through a hosted web app while keeping the CLI open source. He also outlines future personas beyond content creators — teachers generating comprehension questions and researchers parsing large archives — and discusses technical details like Ollama's relationship to Llama CPP, the GGUF model format, prompt engineering strategies for consistent output, and potential database and deployment considerations.

Chapters

00:00:31 - Catching Up and Conference Talk

Nick and Anthony reconnect after some time apart, discussing recent conferences they've attended. Nick shares his experience speaking at All Things Open in Raleigh, describing it as a massive event with around 5,000 attendees where he gave an updated talk on Deno. He also mentions a new AI-focused conference that All Things Open is launching.

Anthony notes he's always wanted to attend All Things Open and has family in the area. The conversation touches on Nick's other recent conference appearances and his plan to submit a talk on building GitHub Copilot extensions to the new AI track, which naturally leads into the main topic of the stream.

00:04:13 - Anthony's Background and AutoShow Origins

Anthony gives a quick overview of his career journey — from music teacher to bootcamp graduate to open-source contributor on RedwoodJS, podcast host, and developer advocate. He explains how AutoShow was born from his desire to automate the tedious process of creating podcast show notes, which could take hours of manual work to produce well-written chapter descriptions with accurate timestamps.

The key insight was that LLMs could read a timestamped transcript and intelligently chunk topics into chapters. His early workflow involved transcribing with Whisper, then manually pasting the transcript into ChatGPT with a prompt. He then built a pipeline to automate the entire sequence — from ingesting a YouTube link to producing a complete markdown file with front matter, transcript, and LLM-generated show notes in a single command.

00:08:24 - Expanding Beyond Content Creators

Anthony describes the technical learning curve of connecting Node CLIs to Whisper (a C-based tool) and local LLMs via Ollama, then pivots to discussing broader use cases. He frames AutoShow as giving LLMs the ability to work with audio and video content, something current chat interfaces can't do natively since they can't watch a YouTube video or listen to a podcast directly.

He outlines three target personas: content creators generating show notes, teachers creating comprehension questions and syllabi from lecture recordings, and researchers parsing large archives of video, audio, or newsreels. A viewer in the chat suggests extending the concept to books and virtual teaching chatbots, which Anthony acknowledges as near-term roadmap items once he adds embeddings support for working with large text documents.

00:11:34 - Voice Notes, Writing Workflows, and AI Tools

The conversation shifts to how both Nick and Anthony use AI in their personal content workflows. Nick describes his process of taking voice notes during walks, feeding them into Claude to produce a cleaned-up first draft for blog posts, and how this approach helped him overcome the difficulty of starting to write from scratch. Anthony shares that he bought his wife a voice recorder for the same brain-dump-to-summary workflow.

They discuss the broader pattern of converting spoken content into written form, with Anthony noting that many people who struggle with writing could produce great content by talking instead. Nick mentions Blog Recorder, another product in this space, and they explore how AutoShow could eventually accept live voice recordings piped directly into its processing pipeline.

00:16:15 - Live Demo Setup and Ollama Explained

Nick begins the live demo by forking the AutoShow repo and running the setup scripts. Anthony explains the setup process, which installs npm dependencies, pulls Ollama models, and downloads the Whisper CPP model. This leads to a detailed explanation of the relationship between Llama (Meta's open-source models), Llama CPP (a high-performance C implementation), and Ollama (a user-friendly wrapper that handles model storage, pulling, and serving on port 11434).

They hit a snag when Nick hasn't installed Ollama itself, prompting a quick detour to download it from the Ollama website. Anthony explains the GGUF model format that allows models to be distributed as single downloadable files, and how Ollama supports not just Llama models but any open-source model including Qwen, Phi, and Gemma.

00:27:52 - Running AutoShow with Ollama

With Ollama installed, they run the first AutoShow command against a YouTube video using the local Llama 3.2 model. Anthony walks through the terminal output step by step: generating front matter from video metadata, downloading and converting the video to WAV audio, transcribing with Whisper CPP's v3 Turbo model (which matches the accuracy of the large model at twice the speed), and finally sending the transcript with the prompt to Ollama.

They examine the output files — the raw LRC transcript from Whisper, the cleaned TXT version with milliseconds stripped to reduce token costs, and the final markdown file with front matter and generated show notes. Anthony acknowledges the small model produces mediocre chapters but notes that larger open-source models or paid services like Claude yield much better results.

00:39:01 - Examining Output and the Astro Integration

Nick and Anthony look at the generated markdown files, including the front matter with metadata like show link and channel name. Anthony reveals he has an Astro site package already configured with the correct content types for these files, enabling automatic SEO-friendly pages for video and podcast content. Nick gets excited about integrating this into his own streaming site's workflow.

Nick describes a potential GitHub Actions pipeline where new YouTube videos would automatically trigger AutoShow processing, generate markdown with summaries and chapters, create a pull request, and auto-merge into his blog. Anthony adds that the chapter timestamp format is specifically designed so that when pasted into YouTube descriptions, YouTube automatically converts them into clickable chapter links.

00:45:21 - Trying Claude and the API Key Dance

They attempt to run AutoShow with Claude's API, which requires a separate Anthropic console account and credits — distinct from a paid Claude AI subscription. After hitting an insufficient balance error, they pivot to the manual workflow: running AutoShow without an LLM flag to get just the transcript and prompt, then pasting that directly into Claude's chat interface. Anthony explains this was actually the original workflow before API integration.

Anthony walks through the prompt file, noting it begins by telling the LLM the content isn't copyrighted (to prevent refusal), then provides detailed formatting instructions with examples. A viewer asks if 150 lines of prompt is normal, and Anthony explains that longer prompts produce more consistent, reproducible output — especially important for generating content that needs to conform to a specific schema for the Astro site.

00:59:29 - Productization, Pricing, and Business Model

The conversation turns to how Anthony plans to monetize AutoShow. He describes a credits-based pricing model inspired by Levels.io's Photo AI, where users purchase credits that are consumed at different rates depending on which LLM model they choose. This gives him control over margins since credit costs scale with actual API expenses. The CLI will remain open source while the hosted web frontend becomes the paid product.

Nick and Anthony debate whether the CLI itself could be monetized, ultimately agreeing that developers would just use their own API keys with the open-source version, and the real market is non-technical users who want a simple web interface. They discuss database options including Turso's SQLite-based multi-tenant approach, data export for canceled accounts, and the decision of whether to keep the frontend code open source given it will contain auth and pricing logic.

01:08:03 - Competitive Landscape and Wrapping Up

Anthony acknowledges many similar tools exist but believes most are obscure and unknown outside tech circles. They discuss the differentiation of AutoShow's multi-format approach — handling YouTube, audio, and eventually live voice recordings — compared to more specialized tools. Nick expresses genuine enthusiasm for the project's ambition and notes he's been more of an AI consumer than builder, making Anthony's work particularly interesting to him.

A viewer asks about regenerating individual chapters, and Anthony explains the current CLI does one-shot generation without conversation history, recommending the manual Claude chat workflow for iterative refinement. They discuss how the future web product could let users highlight and regenerate specific sections. The stream wraps with a raid from another streamer and final plugs for the AutoShow GitHub repository.

Transcript

00:00:31 - Nick Taylor

Hey everybody. Welcome back to Nicky T Live. I'm your host Nick Taylor, and today I'm hanging out with my buddy Anthony Campolo. Hey Anthony, how you doing, my man?

00:00:41 - Anthony Campolo

What's up, man? Good to be back.

00:00:43 - Nick Taylor

Yeah, I'm doing good. It's been a minute. I mean, we've been chatting a bit in Discord, but it's been a minute since we had a chance to actually chat. I'm trying to remember the last time we saw each other in person. Was it Remix Conf? No, it was New Year's at Render as well, right?

00:01:03 - Anthony Campolo

Was it Render? Yeah, that was probably the last time. I haven't gotten to any of the conferences this year. You're at All Things Open and a couple others, right?

00:01:13 - Nick Taylor

Yeah. There was a local conference in February called Konfu, and I gave a couple talks there. The way they have it set up for local people, they ask that you do two talks — I'm not sure why — but I had two talks. And then I had an online conference, I think it was NodeConf 2024 or something like that. And then I don't think I was at any other in-person conferences until All Things Open, which was last week. It was really, really awesome being there. I gave a talk on Deno, which I've given before, but there's been some changes to the framework — partially because of Preact updates and just some cool new features — so I kind of updated the talk a bit. It was a really great conference. I'd never been to All Things Open.

00:02:12 - Anthony Campolo

It's one that I've always wanted to go to. It seems pretty cool. Most people who go there really like it, and I have some family in Raleigh too.

00:02:20 - Nick Taylor

Oh, okay. I didn't know that. Yeah, definitely check it out because it is a great conference. I didn't realize how big it was — after my talk got accepted, I was talking to somebody, I can't remember who, and they were like, whoa, that's massive. I think it was around 5,000 people. Can't remember how big Render is — it might be at the same scale.

00:02:42 - Anthony Campolo

It was like 3,000 or something, in that range.

00:02:46 - Nick Taylor

Yeah, it's definitely not small. I had a great experience both as a speaker and as an attendee. I was going to mention — they have a new AI conference, brand new, sometime in late February or early mid-March. If you just Google "AI All Things Open," or however you check these days —

00:03:13 - Anthony Campolo

Going right now.

00:03:17 - Nick Taylor

Yeah, if you just look for "AI All Things Open" you should find it. If not, I can send it to you, but I'm sure it'll pop up pretty easily.

00:03:26 - Anthony Campolo

I think I've got it here. Yeah, that would be sweet. I'll check that out.

00:03:30 - Nick Taylor

I'm going to submit a talk there, because I've been submitting this talk in a few places — creating your first Copilot extension — so it fits well with the AI tracks.

00:03:44 - Anthony Campolo

That's funny. That's the video I pulled from your channel that we're going to use for the demo — pretty sure that's the one. You brought someone on to talk about it recently, and you had like a seven-minute highlight video about it.

00:03:55 - Nick Taylor

Okay, yeah, yeah. Cool. That's a great segue. But before we get into it — we'll be talking about AutoShow — for folks who might not know who you are, do you mind giving a quick TL;DR of Anthony: the man, the myth, the legend, the arc?

00:04:13 - Anthony Campolo

The journey from the pits of despair to my new role? Yeah, so I was originally a music major slash teacher for a while and then kind of pivoted to tech. Did a bootcamp at Lambda School and got into open source with this framework called RedwoodJS — I don't know if people know about it these days, but it's still around and still doing its thing. It was a full-stack React framework using GraphQL and Prisma and all that. That got me into the open source world. Then I did a podcast called FSJAM for a while, which Nicky T, you were on JavaScript Jam. I don't think you were ever on FSJAM, actually.

00:05:02 - Nick Taylor

I don't think I was on FSJAM. Yeah, just JavaScript Jam, I think.

00:05:05 - Anthony Campolo

So then I got involved with this company, Edgio, and did kind of a JavaScript community thing, went to a lot of conferences, and met Nicky T at one of those. Now I'm just doing my own thing, going solo, and working on an open source AI tool that will eventually be a product.

00:05:26 - Nick Taylor

The product you're talking about is AutoShow, which is what we're going to be covering today. Let me switch to pairing view real quick. So this is the project — it's on GitHub, it's open source right now, and I'll drop a link in the chat. You've been working on this for a bit. We actually did a show on this about five months ago. I'll drop a link to that too. I guess talk a bit about the project and its goals. Some context is that you've done a lot of developer advocacy, and I feel like that's probably part of why you decided

00:06:06 - Anthony Campolo

To build this tool to a certain extent. It definitely connects to the content I was making. It first started because I wanted to create show notes for FSJAM. People who've done podcasts know there are different ways to approach it — you can just record an episode and drop it in the feed, or you can put an ad or a short description, or you can really do it properly with chapters and specific timestamps, resource links, and well-written descriptions for each section. If you want to do show notes right, it can take a lot of work: just listening back to find the right chapter times, writing good descriptions, making sure there are no spelling errors. That alone can take hours. So what I started to realize is that you could use LLMs like ChatGPT or Claude to do a lot of that for you.

00:07:10 - Anthony Campolo

Basically, if you had a transcript with timestamps, you could give it to an LLM and it could read the transcript, see how the topics connect to the timestamps, and chunk everything into chapters. That was the first big aha moment. What I was originally doing was using Whisper — OpenAI's open source transcription tool, one of the last open source things they released — to transcribe the podcast, and then writing a prompt in ChatGPT and copying the transcript underneath it. That was the first version. Then I thought: what if I built a pipeline so you give it a link, it generates front matter from the video metadata, runs the transcription, inserts the prompt, automatically feeds it to an LLM, gets the response back, and packages it all together — front matter, transcript, and LLM output — in a single command.

00:08:24 - Anthony Campolo

So that took me on this whole journey of learning about Node CLIs and connecting to Whisper, which is a C thing, and then figuring out how to feed that into a local LLM with Ollama. We're getting ahead of ourselves with all these buzzwords, but that's the main thing I wanted to do. Now I'm also thinking about how to use it more broadly — not just as a tool for content creators, but as a way to leverage this pipeline to give LLMs the ability to work with audio and, to a certain extent, video. Even with ChatGPT and Claude right now, you can give them pictures, but you can't have them watch a YouTube video or listen to a podcast. There's no way to do that. So I see this as a unique way to leverage that capability, and I'm thinking of different personas.

00:09:33 - Anthony Campolo

So there's the content creator persona, which is what we've been talking about. Then I'm thinking of a teacher persona, where you could take lesson recordings or classroom videos to generate comprehension questions or even syllabi for new classes if you have the material you want to teach. And then there's a researcher persona — someone going through large archives of video, audio, or newsreels, using it to parse through that and produce a shorter, more manageable summary. If you have like 100 hours of audio to go through to find something specific, you could use this to help process large amounts of data. So now my head is thinking about how to broaden the tool beyond just creating show notes for YouTubers and podcasters, because that's a pretty narrow use case.

00:10:35 - Nick Taylor

Yeah, that's super cool. Side note — Abby Manu on YouTube is asking, "Anthony, why aren't you doing live YouTube streams anymore these days?" I'm guessing you're busy.

00:10:51 - Anthony Campolo

That's a good question. I got back into it a little bit right after I stopped doing Edgio and had a good rhythm with my buddy Monarch, but then Monarch got a full-time job after a couple of months.

00:11:02 - Nick Taylor

Okay.

00:11:03 - Anthony Campolo

I did some stuff and scheduled some people, but I haven't done one in, I think, about two or three months. Who was asking?

00:11:11 - Nick Taylor

It's Abby Manu Santani. I'm guessing it's one of your followers on YouTube.

00:11:17 - Anthony Campolo

That's cool that you're asking. Maybe I should do more of it. It's always hard because life gets busy, but there's really no reason I shouldn't — it's nice to have a rhythm. So I'll bring you on as a guest.

00:11:34 - Nick Taylor

Cool. I think it's great that you're thinking beyond just the content creator persona, and at the same time it's good that you focused on that first, because if at the beginning you'd said "I'm making a tool for content creators, teachers, researchers, and everyone else," I feel like you probably wouldn't have made as much progress as you already have.

00:12:02 - Anthony Campolo

Totally. I started with my own use case and had a very clear idea of what I wanted, and then how to extend and make it more useful. As I was talking to people, I was getting more and more ideas. This is why I like doing streams with people like you to show it. And when I was showing it to one of my friends who's not a developer at all — one of my old music major buddies who's now a teacher — he was like, "oh, you could do this with my lessons." That opened up a lot of doorways in my mind.

00:12:33 - Nick Taylor

Yeah, I think the idea of generating comprehension questions from texts is really great for a couple of reasons. One, we all know teachers in North America do not get paid well. So if this is a tool that can help accelerate generating comprehension tests, that's super cool. I almost wonder — maybe thinking too far ahead — whether you'd sell it to teachers or to educational boards directly.

00:13:06 - Anthony Campolo

I did think about this. What I want to do is have a one-year free plan for teachers.

00:13:16 - Nick Taylor

Okay.

00:13:16 - Anthony Campolo

So if you have a teaching email or something like that, let them use it for a year for free. They can get the benefits of it, see if it's something they're into, find different things to do with it over the course of a year, give me some feedback — and then if they see the value, hopefully get some sort of subscription from there on.

00:13:42 - Nick Taylor

I think that's a great idea. It's not a rug pull — you're getting a full year and it's very explicit. You're also probably making the product better in the process. And yeah, it really does sound useful. Hey Fuzzy, how's it going? Fuzzy's in the Twitch chat. Like, this is super useful as a content creator because right now, for example, I've given some talks and I have a slide deck and stuff. I didn't use AI to generate those, but I've been resubmitting some talks and they've been asking for a table of contents breaking down the talk. So for the talks I've already

00:14:38 - Anthony Campolo

Given, I can do that.

00:14:39 - Nick Taylor

Yeah, exactly. I've been doing this in Claude directly right now with just Sonnet — I literally paste in the text version of my slide deck and say, "can you generate a table of contents?" Then I tweak it where I think I need to. Not all conference talk submissions ask for it, but some do and I was like, there's no way I'm typing this out manually. I've been leveraging Claude a lot for many things like that — summarizing stuff, and one flow I use quite a bit is taking voice notes. I'll have a rough draft of my thoughts and then plug it into Claude to clean it up, and that becomes my first draft. It's not what I ship, but it gets me started. Because I'll typically be out for a walk and I'll be recording something, and it's not always coherent.

00:15:46 - Nick Taylor

Oh, thanks for the kind words.

00:15:48 - Anthony Campolo

It's funny you say that. I actually bought my wife a voice recorder so she can do this specifically — she does a brain dump into the voice recorder and then I feed it to Claude to give her a summary of it. And I use AutoShow for that too. This is exactly the same kind of workflow. What other things are you doing a lot with Claude, aside from coding?

00:16:15 - Nick Taylor

Yeah, I definitely use it for coding, which doesn't really fall into this category. But it could eventually, because you could be talking about a technical blog post that has code blocks and stuff. I know there's a product out there called Blog Recorder from Eddie Vink — I met him at All Things Open — but it's specific to just turning voice notes

00:16:46 - Anthony Campolo

Turned into blog articles.

00:16:48 - Nick Taylor

Yeah. But typically my flow — I discovered over time that I have a hard time starting to write. I bought this app called VoiceNotes AI and it works really well. I typically go for a long walk at night or in the middle of the day just to clear my head, and that's when I started recording things. I noticed I was able to do more writing by starting off with voice notes versus actually writing something down first.

00:17:50 - Nick Taylor

Everybody's different, obviously, but I just found that worked better for me.

00:17:55 - Anthony Campolo

This makes a lot of sense. With podcasts, you could just chat and flow and answer questions — and you can then take those conversations and turn them into blog posts. I think a lot of people who struggle to sit down and write but could just talk would find the same workflow really useful for creating more content.

00:18:24 - Nick Taylor

And this would definitely — I mean, we're going to look at AutoShow in a minute, but maybe there could even be a path where you just start recording your voice and on the fly it pipes directly into AutoShow. I don't know.

00:18:40 - Anthony Campolo

Totally possible, because it's already set up to take in local files. You just need some way to record something and then throw that file into the workflow.

00:18:52 - Nick Taylor

Okay, I'm just looking in the chat. Abby Manu — apologies if I'm not pronouncing your name correctly — they're saying: "Not only for YouTube, you can extend it to books. You can create talking books with your back end integrated into a chatbot. It can be a virtual teacher to answer questions about a book as well."

00:19:20 - Anthony Campolo

Yeah, questions about videos, audio, or books — yeah, totally. One of the last things I still need to build in is some sort of embeddings support for working with large amounts of text like books and documents. But that's very near on the roadmap, because I think that's where it'll be really useful for the researcher persona — combining large amounts of text from documents and PDFs with large amounts of video and audio. Think about something like the JFK assassination: there are literally millions of pages of documents,

00:20:02 - Nick Taylor

Yeah.

00:20:02 - Anthony Campolo

Thousands of books, more documentaries than you could watch in a year, and all that stuff. If someone wanted to research that subject, where do you even start? Yo, DogHouse LL, going great, yeah.

00:20:21 - Nick Taylor

Cool. Well, that's a good way to kick things off. Let's look at some code. I forked AutoShow and I'm just going to move this over here for a second. There are a few scripts you got me that I should be running. So I'm just going to start off —

00:20:43 - Anthony Campolo

Just dropping it in the Twitch chat in case anyone wants it.

00:20:47 - Nick Taylor

Oh yeah, cool.

00:20:47 - Anthony Campolo

Moderator, can you delete my message? I thought I was a mod.

00:20:51 - Nick Taylor

Yeah, you should be a mod. I've got to turn off Nightbot — I saw somebody using Nightbot one time and thought it looked good, but I need to just turn it off. Let me see if I can do something to fix that right now. You want to just drop it to me in our personal chat and I should be able to post it.

00:21:13 - Anthony Campolo

Yeah, that way you get it to the YouTube folks as well.

00:21:17 - Nick Taylor

Yeah. And I'm writing down right now because this has happened a few times: Nightbot, go away. Okay. It was a good idea, but I've allowed certain domains and it's just causing more friction than anything. Let me just go grab — you can drop it in Discord if you want.

00:21:44 - Anthony Campolo

I did already, I think.

00:21:45 - Nick Taylor

Oh yeah, okay cool. All right, let's go. Pop that in here. I'll laugh if it moderates me. Okay, now it's good.

00:22:02 - Anthony Campolo

Can you refresh the one you have up? I realized I had a typo in the very last command and I just fixed it.

00:22:08 - Nick Taylor

Oh yeah, which one?

00:22:10 - Anthony Campolo

In your gist. Yeah, refresh it once.

00:22:13 - Nick Taylor

Yeah, cool.

00:22:14 - Anthony Campolo

We may or may not get to the Python stuff, because you need Python for the Whisper diarization.

00:22:21 - Nick Taylor

I should have Python. What's the command? Is it python -v to check the version?

00:22:32 - Anthony Campolo

Yeah, or python3 -v. Or maybe not --v? I'm not really sure — I'm not a big Python person. But you have to do python3 instead of just python, unless you've already configured it otherwise.

00:22:44 - Nick Taylor

Okay, it says I'm on Python 3.11.

00:22:48 - Anthony Campolo

Let's just have this running in the background first while we do the other stuff. Open up a terminal you can go away from and run npm run setup.python.

00:23:00 - Nick Taylor

Yeah, cool.

00:23:02 - Anthony Campolo

Alright, just make sure this is the right one. Have you not run npm run setup yet — just the regular npm one?

00:23:09 - Nick Taylor

No, not yet. I just cloned it before, so.

00:23:11 - Anthony Campolo

Okay, we have to run that one first. Go ahead and run that.

00:23:14 - Nick Taylor

Yeah, okay.

00:23:16 - Anthony Campolo

It'll take a little while.

00:23:18 - Nick Taylor

That's all good. I've got a pretty fast internet connection, so it should go pretty quick.

00:23:23 - Anthony Campolo

Yeah. What this is doing is installing npm dependencies, pulling down models for Ollama — which we can talk about what Ollama is — and downloading the Whisper CPP model. Do you know about Ollama?

00:23:40 - Nick Taylor

Yeah, so my understanding is Ollama is a project out of Meta, it's open source, and the licensing is pretty much anybody can use it without paying. If Google or Microsoft started using it, they'd probably hit some user limit — I think that's the gist of it. And it always runs locally.

00:24:07 - Anthony Campolo

So there's Llama the model and then there's Ollama with an O at the beginning, which is a separate open source project. And then there's Llama CPP. Let me try to break down how these things fit together. There are the pure models themselves — like if someone took ChatGPT or Claude, had the model weights, and put them on the internet so you could download and use them. That's what things like Llama 3.1, 3.2, and 3.3 are — created by Meta. You're right that the licensing has been changing across versions. Then there's Llama CPP, by the same person who did Whisper CPP — it's an implementation of these models in C for maximum performance, and it lets you use any open source model, not just Llama models. There's Qwen, Phi 3.5, Gemma from Google, and a lot more.

00:25:18 - Anthony Campolo

Then Ollama takes Llama CPP and adds nicer usability features — it handles where models are stored on your machine, handles pulling models, gives you nice commands like ollama pull <model-name>, and there's also an official Ollama Docker image. I started with node-llama-cpp, which is the Node implementation of Llama CPP, then tried Llama CPP directly, and eventually found my way to Ollama. Now Ollama is the way to go for local LLMs with AutoShow. I'm not going to mess with the other options. There are so many ways to do this, but I've learned a ton about running models locally in the process.

00:26:12 - Nick Taylor

Okay, that's cool. Sorry, Fuzzy's making me laugh — "Obama Ollama Mama Jamaica." Anyways, yeah, that checks out. My co-worker John McBride is a big Neovim fan and he actually added kind of a Copilot for Neovim using Llama CPP. So I'd heard of it, but I always thought it was specifically the Llama model you had to use with these things. It's really more like an interface that lets you call whatever models — kind of like how GitHub Copilot in chat lets you choose different models. Similar idea, just at a different level.

00:27:14 - Anthony Campolo

Yeah, exactly. And they're all built around something called the GGUF format — GPT Generated Unified Format, I think that's what it stands for. This is really nice because it lets you download models as single big files, and that's what you work with. Ollama makes sure it's stored in the right place and can be accessed, and it also provides a server and things like that. If your setup command is done, we could run the first command.

00:27:52 - Nick Taylor

Okay, let me get back to this one here.

00:27:58 - Anthony Campolo

Yeah, go all the way to the end because it has a couple of flags. We'll explain what this does. So this is going to be npm run as — which is short for AutoShow — and then -- to pass in flags. Then it's --video followed by your YouTube URL. You could just run that and it would give you back the transcript and the prompt from Whisper, and then you could use that with whatever LLM you want. But if you add an LLM flag, it will do the actual processing and create the show notes for you. I'm going to have us use Ollama first, because this makes it all local — nothing is going out to a third-party API. You technically need internet for the YouTube link, obviously, but if you were working with a local file you could run this entirely on your machine without any internet access. And the --no-cleanup flag — I want to use that so I can walk through the entire process.

00:29:02 - Anthony Campolo

This will show the intermediary files that get created along the way. Usually those all get deleted at the end, so you just have the one final file — that's what most people want. But for a walkthrough, it's useful to keep them. After running that, you can see I've also added some chalk-colored terminal output now, so you can see exactly what's happening. It tells you what options have been passed: a video URL, Ollama is set to true, no-cleanup is set to true. And then it gives you the markdown file, —

00:29:34 - Nick Taylor

I was just going to ask — the Whisper CPP, that's the local transcription?

00:29:40 - Anthony Campolo

Yes, exactly.

00:29:41 - Nick Taylor

Okay. Fuzzy's asking about deployment. We can get to that after, I guess — that ties into the productization.

00:29:50 - Anthony Campolo

Yeah. Right now I'm working on having a unified Docker image that pulls all this stuff together, or a Docker Compose setup. Currently I'm using Docker Compose and the models are getting saved multiple times, so I'm ending up with like a 50-gig image. I have to work out some things with that. But you could run this on a Node server, which is likely where it'll end up. Or you could put a Docker container on Fly.io or something like that. But what I'm hoping for eventually is a hosted version that people can just use directly. If people did want to run and deploy it themselves, there will be a way to do that too.

00:30:37 - Nick Taylor

Okay, so it looks like the Ollama server isn't running.

00:30:42 - Anthony Campolo

Okay. Let me check the Ollama docs real quick.

00:30:54 - Nick Taylor

There's Whisper CPP — that's all there.

00:31:01 - Anthony Campolo

Let's close that. Run ollama list.

00:31:07 - Nick Taylor

Okay. Did this finish? Just like that?

00:31:14 - Anthony Campolo

Yeah, it's hard to tell.

00:31:17 - Nick Taylor

Oh wait, I spelled it wrong.

00:31:20 - Anthony Campolo

It's hard to see by the way, because your llama head is right in front of your screen.

00:31:25 - Nick Taylor

Oh okay, yeah. Here, let me fix that.

00:31:28 - Anthony Campolo

It's o-l-l-a-m-a.

00:31:33 - Nick Taylor

Oh, is it two Ls?

00:31:40 - Anthony Campolo

Two Ls.

00:31:42 - Nick Taylor

And two Ms? Or one M?

00:31:47 - Anthony Campolo

One M. Yeah.

00:31:47 - Anthony Campolo

Okay, so you haven't downloaded Ollama yet, I don't think.

00:31:50 - Nick Taylor

Okay, I didn't see that in the gist. Did I miss it?

00:31:53 - Anthony Campolo

Oh, no — I sent you a message in Discord about it.

00:31:56 - Nick Taylor

Oh, sorry. Okay, well that probably explains why we're running into issues. Give me a sec.

00:32:08 - Anthony Campolo

The setup command probably had a message in the output that said you need to install Ollama.

00:32:20 - Nick Taylor

I think I have the gist here. I don't see the message you sent me about installing Ollama, though.

00:32:29 - Anthony Campolo

It was right before the gist. The last thing I said was "also download Ollama if you haven't already."

00:32:41 - Nick Taylor

The next thing I have after my "cool cool cool" is the gist.

00:32:46 - Anthony Campolo

No, I'm saying before the "cool cool cool."

00:32:49 - Nick Taylor

Yeah... okay, I see it. "NPM run setup AutoShow, which will also download Ollama." Okay, so how do I download Ollama?

00:33:00 - Anthony Campolo

Just search for Ollama and you'll get to the home page.

00:33:04 - Nick Taylor

Okay, I'll just pop that link in the chat for people. This is also why live coding is always fun — in this case it was me literally missing a step that Anthony explicitly told me to do before the stream. Okay, downloading it for Mac. And boom.

00:33:33 - Anthony Campolo

Okay.

00:33:35 - Nick Taylor

Okay. Where should I put Ollama? It's an app, I guess.

00:33:41 - Anthony Campolo

Yeah, it's just like downloading any macOS app. You want to put it in your Applications folder, and when you start it up it'll open in your menu bar, not in your dock.

00:33:52 - Nick Taylor

Okay, there we go. Welcome to Ollama. Next. Install it for the command line. Okay. Run your first model —

00:34:10 - Anthony Campolo

Yeah, don't worry about that. Don't do that actually.

00:34:15 - Nick Taylor

Okay, let me just make sure. We're good.

00:34:18 - Anthony Campolo

Do npm run setup again, because this time it'll actually use Ollama. Right there — "Ollama is installed, Ollama server is already running, model is not available, pulling the model." So now you should go into the code for the setup script so people understand

00:34:37 - Nick Taylor

What's happening here in setup.sh?

00:34:41 - Anthony Campolo

Yeah, that's the one. So this checks to make sure you have yt-dlp, then it copies over a .env file, which is necessary for how the Node command runs. Then there's the yt-dlp check. yt-dlp is the tool that interfaces with YouTube — it can download video, strip the audio out, grab metadata, and all of that. Then the Ollama section makes sure the server is running, because Ollama runs on port 11434 and that's how your computer talks to it. And then it pulls the Llama 3.2 1B and 3B parameter models. These are their new small Llama models, which are kind of limited, but they run really fast so they're good for demos.

00:35:38 - Nick Taylor

Okay, so a lot of checks and installs. Here's the npm install, and then this is Whisper.

00:35:46 - Anthony Campolo

Then it clones Whisper CPP, downloads the models, and runs the make command. I see a Fuzzy comment. Basically, I used to just have a README with all these different setup steps and I was like, this is ridiculous. I want a single setup command. If you're on Windows, there's no guarantee it'll work — probably a guarantee it won't. Cross-platform support isn't something you should expect right now. That's something I'll want to prioritize at some point, but right now it's all built around macOS because that's what I develop on.

00:36:21 - Nick Taylor

Well, if you end up running this in a container, that's probably not an issue then.

00:36:25 - Anthony Campolo

Ideally — though with Docker and Macs having different chips, even Docker containers are hard to make truly portable anymore.

00:36:39 - Nick Taylor

Oh yeah. I didn't realize that.

00:36:41 - Anthony Campolo

It's not impossible, just extra steps that make things annoying.

00:36:45 - Nick Taylor

Okay, so now it's working — this is where we ran into the "not found" issue earlier.

00:36:51 - Anthony Campolo

Let's scroll up. We can go step by step through the terminal output. So first it generates the front matter, then it creates a markdown file with that front matter. Then it downloads the video, strips the audio out to a WAV file — which is what you need for Whisper CPP to process it correctly. Then it uses the Whisper model. Have you heard of Whisper Turbo V3?

00:37:25 - Nick Taylor

I've heard of Whisper CPP but I've never used it.

00:37:29 - Anthony Campolo

So Whisper itself is a Python library released by OpenAI — it's not super actively maintained, but every now and then they put out a new model. They recently released one that's basically the same as their most recent large model in accuracy, but it runs twice as fast. That's why it's called V3 Turbo — it's a really big deal. Transcription was a huge bottleneck. An hour-long video could take five to ten minutes before; now it's about five minutes flat.

00:38:10 - Nick Taylor

Okay.

00:38:10 - Anthony Campolo

And it outputs a file that I then do some transformations on — we can look at those files once you get to the end. So if you scroll down to step four — yeah, step four is where we're using Ollama. It tells you the model being used, checks the server is running, sends a chat request with the transcript and the prompt, gets the response back, and scroll down to the last step. Yes, those are explaining that it creates the transcript, saves it to a temporary file, and then generates a new markdown file with everything together.

00:38:50 - Nick Taylor

Okay, cool. And then we're done. Okay, great. So if we look at the folder for a second,

00:39:01 - Anthony Campolo

It's in content at the very top.

00:39:03 - Nick Taylor

Okay cool. So if we look here — I guess the first one is the show notes.

00:39:09 - Anthony Campolo

Yeah, that's the complete one.

00:39:12 - Nick Taylor

Okay, so this is nice — you've got the show link, channel, and stuff. This is the front matter here. So I could potentially generate a bunch of blog posts from this, or the show notes like you were saying.

00:39:40 - Anthony Campolo

So I also have — if you look in the packages — an Astro site that's already set up with the content type you would need for this.

00:39:51 - Nick Taylor

Okay, yeah. So it would be an Astro content collection, type-safe and everything. That's cool. Going forward, you could generate stuff for your blog — maybe publish or put these as drafts automatically. You'd have to wire that up with a webhook or something. But this is —

00:40:23 - Anthony Campolo

That's one of my plans. And one of the things I've been doing is using this to create pages for all the video and podcast stuff I've done over the last three years — I've done around 200 videos and podcasts altogether. This is a way to get instant SEO for it, because you get all this content and link equity pointing at your site.

00:40:47 - Nick Taylor

Yeah.

00:40:47 - Anthony Campolo

So this output isn't great and it's not really what we asked for — it doesn't get the chapters right. This is because it's a really small model we're using for demo purposes. A better open source model would give you better output, but downloading those requires like a seven-gig download and just adds friction. For the most part, Claude or ChatGPT are going to give you much better results. If you look at some of the intermediary files — go back to the LRC one you were looking at. Yeah, so this is what Whisper outputs. What I do is strip out the top line where it says "whisper.cpp" and clean it up so the milliseconds are no longer there, because when you feed this to an LLM you're paying by the token. Input tokens are cheaper, so you can get away with feeding a lot of content, but shortening it is always useful for cutting costs.

00:41:49 - Anthony Campolo

So if you go to the TXT file, you'll see how it's transformed.

00:42:00 - Nick Taylor

Thank you for the follow — sorry, somebody just followed. I can't see the handle, but thank you. You were saying to open up

00:42:12 - Anthony Campolo

Your sidebar again, so you can see all the files. There's one that's .txt.

00:42:18 - Nick Taylor

Okay, yeah, here it is.

00:42:19 - Anthony Campolo

That's after the transformation from the .lrc.

00:42:24 - Nick Taylor

Okay.

00:42:25 - Anthony Campolo

And then if you look at the markdown one that doesn't say "show notes."

00:42:36 - Nick Taylor

This one here?

00:42:37 - Anthony Campolo

Yeah, that's just the front matter. All of these files are the ones that, if you don't run --no-cleanup, will get deleted at the end. You'll just have the one final clean file.

00:42:48 - Nick Taylor

Yeah, but this is already super handy on its own. Like, I don't have this on my website right now, but long term I want separate pages for each video — so when you click on it, instead of going straight to YouTube, you'd go to a page with the video embedded plus a summary. Pretty much what we just generated. I could see this in a GitHub Actions workflow where I'd say, get my latest YouTube video, run AutoShow, generate the markdown with front matter, make sure it's in the right folder in my blog, and then commit it. I already have code that creates a PR and auto-merges it.

00:44:27 - Nick Taylor

So assuming the checks pass, it either merges automatically or at minimum puts up a PR for me to review — which already saves me a ton of time and adds new content I currently don't have, like summaries, show notes, and chapter timestamps. And the chapters I guess you could also republish to the YouTube video description so they're easier for people to navigate.

00:44:53 - Anthony Campolo

Right, the chapter format I'm using is specifically designed so that when you paste it into your YouTube description, YouTube detects those timestamps and turns them into clickable chapter links. And when you go to YouTube's chapters view, instead of auto-generated chapters, it uses those as the chapter titles.

00:45:21 - Nick Taylor

Okay, cool. So we did a run with Ollama. But you can also — let me zoom in here — use Claude. For that I need to add my own API key, right?

00:45:44 - Anthony Campolo

Yep.

00:45:45 - Nick Taylor

Okay, let me do that. I'm going to switch to just chatting view for a second so I can set this up. I actually have a generated API —

00:45:59 - Anthony Campolo

And while you're doing that, I can go through all the LLMs I support: ChatGPT, Claude, Gemini, Cohere with their Command R model, Mistral which has both open source and their own hosted models. And then there are three companies called Fireworks, Together, and Groq — they all basically run open source models for you. You can use bigger ones like Llama 3.1 405B which would be way too big to run on most machines, but these services let you use them without having to deal with the hardware requirements. They're a nice middle ground between fully proprietary third-party services and running things yourself. So that's nine different options total.

00:47:03 - Nick Taylor

Cool. Here, I'll switch back for a second. I've got to get the API key from the Anthropic site. What do I do here?

00:47:16 - Anthony Campolo

Try to get to your dashboard. Are you logged in?

00:47:22 - Nick Taylor

I don't even see a login here. "Start building" maybe. There we go.

00:47:26 - Anthony Campolo

I would just Google "Anthropic API key."

00:47:31 - Nick Taylor

Yeah, that's what I did. That's what brought me here.

00:47:35 - Anthony Campolo

Okay, yeah — try "Anthropic console." That's what you want.

00:47:41 - Nick Taylor

Yeah, that's where I am. I wonder if this is different from Claude, because I have a paid Claude account, but — okay, just give me a sec.

00:47:53 - Anthony Campolo

Sure.

00:47:53 - Nick Taylor

It's asking me to fill out what my company is. All right. Create account.

00:48:06 - Anthony Campolo

Both OpenAI and Anthropic do this — they have their main chatbot product and a separate developer dashboard, and the two are kind of siloed from each other.

00:48:18 - Nick Taylor

Yeah. The thing that's not clear to me is whether paying for Claude AI means I'm covered in Anthropic's dev console too — because it's similar with ChatGPT versus the API. Anyways, let me add my key for Anthropic. All right, that should be good. All right, back to pairing view. I have an Anthropic API key now, so I should have access to Sonnet. Let's copy this command. We'll find out very quickly if it worked.

00:49:15 - Anthony Campolo

And in this command, people can see we're not passing --ollama like before. Instead you pass --claude, and optionally a specific model name. If you just pass --claude, it uses the default cheaper model. Or you can specify a model like Claude 3.5 Sonnet.

00:49:41 - Nick Taylor

Sonnet is really great. I've been using it for literally everything — blogging, code, and the artifacts in the Anthropic chat are pretty solid too. Okay, let's go ahead and run this. Hopefully my API key is good. All right, going through the same flow again — it'll just differ at the step where it calls Ollama and instead hit the Claude API.

00:50:08 - Anthony Campolo

Yeah, we can walk through some of the code once we get the output. Essentially the only difference is that instead of sending everything to Ollama, it sends it to the Claude API, talks to their service, and gets the response back.

00:50:28 - Nick Taylor

Okay, so — type error. Balance too low?

00:50:31 - Anthony Campolo

You might need to buy some credits. A lot of these require you to start with like $20 in credits.

00:50:40 - Nick Taylor

Okay, I'll go ahead and do that.

00:50:42 - Anthony Campolo

I can give you my key if you don't want to go through the whole process.

00:50:46 - Nick Taylor

Oh no, it's all good. This is what I meant about paying for Claude AI versus Anthropic.

00:50:55 - Anthony Campolo

Why don't we do this — run it without the Claude flag, and then we can copy-paste the output into Claude.

00:51:05 - Nick Taylor

Okay, gotcha. Cool.

00:51:07 - Anthony Campolo

That's how we used to always do this, actually. And it shows how people can do this with just a Claude subscription, without needing an API key — because you can just paste it into the chat directly. And people will also get to see the prompt.

00:51:22 - Nick Taylor

Okay, I got rid of —

00:51:28 - Anthony Campolo

Yeah, get rid of both --claude and the model name, not just the model name.

00:51:31 - Nick Taylor

Oh, okay. So to be clear, when we're not passing anything, what does it default to?

00:51:41 - Anthony Campolo

The default doesn't send anything to an LLM — it just gives you the transcript and the prompt file. Then you can copy-paste that into whatever LLM you want.

00:51:54 - Nick Taylor

Okay, yeah.

00:51:55 - Anthony Campolo

This is how the project was first built. It worked like this for a long time because I didn't want to pay API costs every time I wanted to generate one of these. I was already paying $20 a month each for Claude and ChatGPT subscriptions, so economically it just made more sense to use their chat interfaces. But having the automated step is great too, because if you want to process 100 files, copy-pasting each one would take hours. So it doesn't make sense at scale.

00:52:35 - Nick Taylor

Cool. All right, so I should grab which file? The one

00:52:39 - Anthony Campolo

The one with "prompt" at the end.

00:52:41 - Nick Taylor

Okay, so — whoops, didn't mean to copy that. Undo. All right, and let's just get rid of that. Okay. And so what's the prompt I should ask? Just generate?

00:52:57 - Anthony Campolo

No, you just copy. You don't write anything. You just copy-paste the whole thing in.

00:53:01 - Nick Taylor

Okay.

00:53:02 - Anthony Campolo

Let's actually read the file first. It says this is a transcript with timestamps and it doesn't contain copyrighted material — because Claude will constantly flag it as having copyrighted material. But if you just tell it upfront that it's not copyrighted, it moves past that.

00:53:21 - Nick Taylor

Okay, that's the copyright check. Good old prompt engineering stuff.

00:53:26 - Anthony Campolo

Yeah. The default prompt tells it to write a one-sentence description, a one-paragraph description, and then chapters with titles and descriptions, and gives an example of what that should look like in markdown. If you scroll down a little bit more.

00:53:46 - Nick Taylor

Okay, yeah. I've done stuff like this before at Open Sauced where we give examples and —

00:53:53 - Anthony Campolo

Keep scrolling all the way to the bottom and you'll see the transcript section. Scroll up to just the beginning of the transcript section.

00:54:04 - Nick Taylor

Yeah, there we go.

00:54:06 - Anthony Campolo

So that's where the transcript starts. When you use an LLM flag, it feeds this entire file to an LLM, gets a response back, removes the prompt, keeps the front matter and the transcript, and inserts the LLM response in between. The way we're doing it now manually — we copy-paste it in, then you copy the response from Claude back over the prompt section and you get the final output. The manual steps are what I'm bypassing by using an API key.

00:54:43 - Nick Taylor

Yeah, okay, that makes a lot of sense. Hey Nyxia, thanks for joining us over on Twitch. One thing I was thinking of when you go to productize — you have the prompt with stuff like "write a one-sentence description," and maybe some things you shouldn't change. But options like the one-paragraph summary being 600 to 1,200 characters — those could just be settings that people tweak in a UI, and it generates the proper prompt based on what they chose. You know what I mean?

00:55:22 - Anthony Campolo

That's exactly how the prompt configuration already works. There's a --prompt flag where you can do short chapters, medium chapters, long chapters — you can also do titles, key takeaways, and other things. And I'm eventually going to have an entire custom prompt section where you can write your own prompt. Right now it has prompt defaults that I like and think work well, and because of how it's formatted you get things like the chapter timestamps that YouTube can parse. But yes, you can configure the prompt, so if you just want the summary or just the chapters, you can do that. When it comes to prompt writing there are a lot of different schools of thought, but prompts tend to be pretty lengthy — though they're getting shorter now with things like o1, which can think through multiple steps.

00:57:12 - Anthony Campolo

So you don't have to be quite as explicit about everything you want. I haven't really been using AutoShow with o1 much yet because they have a really strict message limit — only 50 messages a week, which I'm using for coding. So I haven't messed around to see how different it is for generating show notes.

00:57:29 - Nick Taylor

Yeah.

00:57:30 - Anthony Campolo

From —

00:57:30 - Nick Taylor

From what I've seen, you do have to be verbose to avoid hallucinations. Like in Open Sauced — we use Star Search, which is our AI offering — you have to say things like "don't hallucinate the username." For example, my co-worker Brandon Roberts, his GitHub handle is Brandon Roberts. We also have a Bing service that fetches additional information, and we basically have to say whatever you return is only valid if it's literally the GitHub username "Brandon Roberts," because it was initially hallucinating — it got something about a Brandon Roberts, but not my co-worker. There's stuff like that where you end up writing in all caps, kind of like scolding a child, which sounds funny, but there's just weird stuff you have to do.

00:58:44 - Anthony Campolo

Like "please do it or I'll be fired and made homeless." Yeah.

00:58:51 - Nick Taylor

And I've done things like "you must absolutely put it in this format," and then you give an example like you have down here.

00:59:01 - Anthony Campolo

That really makes a big difference. If you give it an example of exactly what you want, it's much less likely to go off the rails.

00:59:08 - Nick Taylor

Yeah, exactly. So this is cool — you've obviously thought about a lot of things already, because I was suggesting options for the prompt but you've already got flags for that. So when you productize this, you could just surface those in a UI, and under the hood it would call the CLI with those flags.

00:59:29 - Anthony Campolo

Yeah, exactly.

00:59:31 - Nick Taylor

Cool. So what's the state of AutoShow right now? Obviously people can still grab it, it's open source, they can pop in their own keys. It's definitely useful that way. I'm going to mess around with it — I'll probably pay some Anthropic credits instead of having to paste it into Claude AI, because that's kind of an annoying step and I can definitely take the $20 hit. But in terms of productizing it — I know a lot of people struggle with finding a good balance on pricing. That app I mentioned, VoiceNotes AI — I got in early so I paid like $50 for a lifetime membership, which there's no way you could offer to everybody because it just wouldn't scale given the amount of prompts you're processing.

01:00:43 - Nick Taylor

So I guess it's trying to figure out the bare minimum to get sensible output without breaking the bank, while adding enough margin because you're building a product around it. I'm curious if you have any thoughts on that or whether you're even there yet.

01:01:12 - Anthony Campolo

I have thought about this and have a pretty good idea of what I plan to do. There's an app called Photo AI built by Levels.io — he's a prolific indie hacker, very big on Twitter. I really like his model, which is that you buy credits and then use credits to generate what you want. You spend $10 a month and get a certain amount of credits, or $50 a month and get way more. The cost in credits varies based on how you configure the run — you can choose different Claude models or ChatGPT models, each consuming different amounts of credits because they're more expensive. That's where I'll have a lot of control over margins, because users are effectively paying per generation.

01:02:17 - Anthony Campolo

The cost will vary based on what it's actually costing me, and anytime they want more credits, they can buy more.

01:02:24 - Nick Taylor

Nyxia in the chat saying credits is a nice model. Just to be clear — there's a cost you pay to the AI services, and then your AutoShow credit is your own unit on top of that. So say it costs 10 cents per thousand tokens sent to Anthropic, and you need to make some money, so you add a cent — your credits come out to $0.11, and that's what you sell, not the raw AI credits?

01:02:59 - Anthony Campolo

Yeah, exactly. I'm inventing the concept of an AutoShow credit that people buy, which translates to a real dollar amount on the back end. One of the things I'm implementing in the next week or so is cost estimation in the terminal output — because there are input tokens and output tokens, and the rate is different for each model. I'm going to build a config object for each model with the name and cost per token, then do the math to surface the exact cost of each run right in the CLI.

01:03:44 - Nick Taylor

That's what I was going to say — you could have a dry run that says "this is going to cost you 50 credits but you only have 30, top up or whatever."

01:03:54 - Anthony Campolo

Exactly. Or just use a cheaper model if you want.

01:03:59 - Nick Taylor

Yeah. I'm excited to see this land. I'll definitely use the open source version, and once it's productized it'll be even easier. But I wonder — would you productize the CLI itself as well? Like, picture a content creator who's not a dev. They go to the website, paste in a YouTube link, and it just generates stuff for them to download. But as a dev I'm like, this is super useful and I want it in my CI/CD pipeline. You could monetize the CLI in the same way — it's just a different surface for the same thing. But maybe that's out of scope.

01:04:54 - Anthony Campolo

It's interesting. The way I've been thinking about it is that the CLI stays open source and provides the base for the server and front end, which become the paid product. I guess my question is how would monetizing the CLI even work — would it be a packaged product with a license or something?

01:05:20 - Nick Taylor

Yeah, that's true. If I'm a dev, I'd probably just use my own API keys. That would probably make more sense.

01:05:30 - Anthony Campolo

The front end is for people who don't know how to spin up a git repo and run terminal commands. That seems like the much bigger market. Because if someone can already use the open source CLI, why would they need a slightly nicer CLI version of it?

01:05:45 - Nick Taylor

Oh yeah, for sure.

01:05:53 - Anthony Campolo

Yeah. I'd need to see more examples of paid CLIs. Companies like Warp are doing interesting things in the terminal space, but it's rare to see a standalone CLI as a paid product. The closest example is Docker, but they basically force you onto Docker Desktop anyway — they don't really let you just have the CLI, which would actually be nice.

01:06:17 - Nick Taylor

Yeah. Now that I'm thinking about it, you're right — as a dev I'd just put my API key for whichever models I want and pay that directly. I guess the only thing you might suggest is a way to sponsor the project if someone wants to give back.

01:06:44 - Anthony Campolo

I do have a sponsorship thing in the .github folder. If you go to the repo, there will be a sponsorship button that shows up like any other GitHub Sponsors profile.

01:06:55 - Nick Taylor

Okay, cool.

01:06:55 - Anthony Campolo

It's not super prominent, but it is set up if anyone wants to sponsor.

01:07:00 - Nick Taylor

Okay, cool. So basically forget what I was saying about the CLI — you're right.

01:07:08 - Anthony Campolo

No, it's an interesting thought. There are so many ways to try and make money off of open source stuff, but I think people just want apps — they want to go to a site, log in, and do the thing.

01:07:24 - Nick Taylor

Yeah, no, that makes sense. And you're totally right — I'm thinking "what if I spin up a local UI?" but at that point just go to the hosted paid thing. It's not worth it.

01:07:38 - Anthony Campolo

The thing I keep going back and forth on is whether to make the front end open source or not, because it's going to eventually have auth logic, pricing, and all that stuff built in. I'm thinking I might keep the server open source so people can self-host if they want, but keep the front end as a private repo. We'll see.

01:08:03 - Nick Taylor

Yeah, I am excited to see where this goes. I'm sure other people are potentially doing similar things, but —

01:08:14 - Anthony Campolo

Oh yeah, tons. There are so many versions of this already, but I find they're all kind of obscure and most people who aren't into tech don't know about any of them. So I'm just trying to ride this wave.

01:08:27 - Nick Taylor

Yeah. And I think it's smart that you're not just limited to one format. I've heard very good things about Blog Recorder — I haven't used it because I just go into Claude with my voice notes and that's good enough for my blogging workflow. But I like the angle you're taking where it could be YouTube, it could be audio. I could totally see — like we were talking about earlier — where you just kick off a media recorder in the browser, or if you did a mobile app, and once they're done recording you pipe that right into AutoShow. You'd probably have to put a limit on recording length, but still. I'm genuinely impressed with everything you've done. I have to dig into the code because I haven't really —

01:09:32 - Nick Taylor

I've been more of a consumer of things like this. Even at Open Sauced, I built out the experience in the app and did some AI agent work, and I've done some stuff with Copilot extensions like we were talking about. But I've typically been consuming the AI offering rather than building it. So I need to spend some time digging into your project. How long have you been working on it?

01:10:03 - Anthony Campolo

It depends when you say I started it, but somewhere between six and nine months — probably closer to nine months now. Just piece by piece. But let's go through the relevant code files for the commands we ran and explain how that whole flow works.

01:10:27 - Nick Taylor

For sure. Let's go back to pairing view. Let me close this and open up the sidebar.

01:10:40 - Anthony Campolo

Open up src and go to the entry file, autoshow.ts.

01:10:47 - Nick Taylor

Okay. So this is obviously a Node script. We've got a bunch of imports — you're using Commander.js, I think you said?

01:10:59 - Anthony Campolo

Yep, that's what drives the CLI. And then Inquirer is for the interactive prompt.

01:11:08 - Nick Taylor

Okay, yeah.

01:11:09 - Anthony Campolo

It'll walk you through each step interactively and surface all the different options to you.

01:11:15 - Nick Taylor

Okay, gotcha. Yeah, I'm familiar with Inquirer — when I did some work on the Remix Edge adapter at Netlify, the Remix CLI uses Inquirer as well. Pretty neat.

01:11:30 - Anthony Campolo

Yeah. So these are the different commands for processing. processVideo is for a YouTube URL, processPlaylist for a YouTube playlist URL, processChannel for a YouTube channel URL — that runs every single video on the channel. processURLs is for a file with a list of URLs if you want to grab a bunch of random YouTube videos that aren't already in a playlist. processFile handles a local file instead of a YouTube URL. And processRSS is an RSS feed for any podcast — that has a whole bunch of custom logic because it's working with an RSS feed rather than a YouTube URL.

01:12:10 - Nick Taylor

Yeah. And there's RSS for audio, like podcasts. But you could also use this for a YouTube playlist because —

01:12:22 - Anthony Campolo

Yeah, but I found that since I was already using yt-dlp, it didn't really make a whole lot of sense to add that separately.

01:12:30 - Nick Taylor

Oh, okay.

01:12:31 - Anthony Campolo

Yeah. I don't like the YouTube API. I've used it before and it bugs me — all these weird quirks, like you can only get 10 videos back at a time, you're always dealing with pagination. If you're working with any substantial number of videos it's just frustrating.

01:12:49 - Nick Taylor

Yeah, gotcha. We don't have to go through all the code, but this is just a function to check that you passed in a valid action. This is kind of like —

01:13:01 - Anthony Campolo

This is where the CLI itself is defined. Commander's stuff. So these are the different processing options — all the ones I just explained.

01:13:11 - Nick Taylor

Right. So basically all the available flags — like if you ran --help. I think if you run AutoShow on its own, it gives you this too?

01:13:22 - Anthony Campolo

If you run it without passing anything, it goes to the interactive prompt.

01:13:27 - Nick Taylor

Oh, okay.

01:13:28 - Anthony Campolo

You need to use -i to get the interactive mode.

01:13:30 - Nick Taylor

Okay, cool. And I see for transcription you have Deepgram and stuff too?

01:13:35 - Anthony Campolo

Yeah, Deepgram and AssemblyAI are the two transcription services I offer. I haven't put a ton of work into those because I always use Whisper myself, but I originally added them because they support diarization — speaker labels — which is a big limitation of Whisper. Whisper just gives you a single block of text with nothing tied to speakers. I also integrated Whisper Diarization, which is an open source library that adds diarization to Whisper, but it runs extremely slowly. They're working on a way to make it much faster by reducing the dependency footprint, but that's in a future release that isn't out yet. Anyway, the part we're looking at now is the LLMs — Ollama, ChatGPT, Claude, and so on.

01:14:28 - Nick Taylor

Yeah, that makes sense. In Descript you can identify speakers and it can be handy. I used Descript all the time for a lot of stuff, but since I can do a lot of these things with LLMs now, and especially with something like AutoShow, I don't really need Descript for that anymore.

01:14:56 - Anthony Campolo

I basically built AutoShow because I didn't want to pay for Descript.

01:15:02 - Nick Taylor

That said, Descript is a great product, but —

01:15:06 - Anthony Campolo

Yeah, totally.

01:15:08 - Nick Taylor

Okay, so we've got all the flags for the different models, and you can specify the model version like we saw earlier with Claude Sonnet 3.5. What's Fireworks? I know Mistral, but —

01:15:27 - Anthony Campolo

Fireworks is similar to Together and Groq — all three run open source LLMs as a service. So you can run any of the Llama models, but they give you access to really large ones like Llama 3.1 405B, which is way too big to run on most people's machines.

01:15:48 - Nick Taylor

Gotcha. Okay, cool. Sorry, got distracted by something on my other screen.

01:15:59 - Anthony Campolo

The utility options we kind of already saw — --no-cleanup which keeps all the intermediary files, and --prompt to select different prompt configurations. Oh yeah, I think I actually changed it so running npm run autoshow by itself doesn't go to interactive mode anymore — you have to use the -i flag for that, because something weird was happening with the old behavior. But anyway, there's also some TSDoc — it's like JSDoc for TypeScript. First time I've really gotten into that.

01:16:33 - Nick Taylor

Okay. So we've got the options and then —

01:16:39 - Anthony Campolo

The rest of this isn't that important — it's just the logic Commander needs to work. Let's scroll all the way to the end of the file.

01:16:54 - Nick Taylor

It's just —

01:16:55 - Anthony Campolo

That's the end of the file.

01:16:56 - Nick Taylor

Okay, cool.

01:16:57 - Anthony Campolo

If you go to processVideo, that explains the whole flow — which is pretty much what we've been walking through. It breaks down into a few functions. Scroll past all this to the actual code.

01:17:16 - Nick Taylor

Yeah, here we go.

01:17:18 - Anthony Campolo

First it runs generateMarkdown, which creates the front matter by grabbing all the video metadata. Each of these functions lives in its own file in the utils folder. After that is downloadAudio, which converts the video to audio and gets it into a format Whisper can process. Then it runs runTranscription, which is generic and can call out to Whisper, Deepgram, or Assembly depending on your flags. Same pattern with runLLM — I've decomposed each step into its own function, and the different services are separate files that get called from those functions.

01:18:14 - Nick Taylor

Oh yeah, I was just peeking at that here.

01:18:16 - Anthony Campolo

Yeah, sure.

01:18:17 - Nick Taylor

Okay, so you've got a map of which service to call, then you just call the specific one. And then we've got cleanup.

01:18:26 - Anthony Campolo

Yeah, cleanupFiles is the final step — it just deletes all the extra intermediary files at the end. So there are five pieces, and that's what the terminal output is counting through with step one, step two, and so on. This is how anything gets processed — whether it's a video URL, an audio URL, one LLM or another transcription service. These five functions are the whole workflow.

01:18:56 - Nick Taylor

Okay. One suggestion I'd have — when it generates these files, if you want to do multiple runs, does it delete the old ones first or just overwrite?

01:19:14 - Anthony Campolo

That can be a bit tricky. If you do multiple runs on the same input with different LLM services, it appends the LLM service name to the filename so they won't necessarily overwrite each other. But if you run the exact same combination twice with the same LLM, it will overwrite. Every video generates a unique base name, so you won't get name clashes between different videos. But multiple runs of the same video with the same settings — that's something to be aware of.

01:20:02 - Nick Taylor

Right. And once you productize it with more than one person using it, you could always append a timestamp or something to make filenames unique. But I'm getting into the weeds.

01:20:19 - Anthony Campolo

In the product, every run will save the output into a database that can then be accessed — so that's probably how it'll end up working.

01:20:31 - Nick Taylor

Yeah. I'm just thinking about the product version. I've hung out with the Turso folks a couple of times on stream —

01:20:42 - Anthony Campolo

And that's the kind of thing I'll most likely be using — some sort of SQLite-based setup.

01:20:46 - Nick Taylor

Yeah, because it's interesting — you can have a parent schema and spin off separate databases per customer with the same schema. So every new customer gets their own database. And then if someone cancels, exporting their data would be pretty easy — you could give them a SQLite dump. Not that you want people canceling, but data portability is good to have.

01:21:33 - Anthony Campolo

Yeah, definitely — supporting data export is going to be really important.

01:21:39 - Nick Taylor

It could be interesting to chat with Jamie Barton over at Turso if you go the SQLite route.

01:21:47 - Anthony Campolo

Yeah, he was on JS Jam way back. And yeah, Turso was at one of the Remix —

01:21:56 - Nick Taylor

Yeah, they were at Remix Conf. That's where I met Glauber. Yeah, okay, cool. So we're getting close to time here — about eight minutes left. Is there any other code you want to show, or should we switch back to chat view and wrap up?

01:22:18 - Anthony Campolo

Someone's asking — good question — whether it's possible to redo a single chapter. The way it's set up right now, if you get output you want to tweak and you're working through a chat interface like we just did, you can just tell Claude "could you change this?" or "could you make it slightly different?" I do that sometimes when it only generates a couple of short chapters for a two-hour video and I need like eight. That's not really built into the CLI workflow because it's set up for one-shot generation — bringing in message history gets complicated fast. That's something I'll have to work on.

01:23:26 - Anthony Campolo

In terms of letting people tweak output — I still have to figure that out. But I'd recommend, if that's what you need, to run it without the LLM flag to get just the transcript and prompt, and then paste that into Claude directly. That gives you a lot more flexibility to ask follow-up questions and refine things interactively.

01:24:04 - Nick Taylor

Yeah, for sure. And once it's productized — you said you're going to store it in the database — like Nyxia was saying, if someone wants to change a single chapter, maybe in the UI you'd let them highlight a section and say "go update this."

01:24:27 - Anthony Campolo

Yeah, totally. A lot of LLMs already have that kind of capability — with Claude you can highlight parts of your code and ask questions about just that section.

01:24:40 - Nick Taylor

Yeah. I said it before, but this is a super cool and ambitious project.

01:24:47 - Anthony Campolo

Oh, there's a huge raid by the way.

01:24:50 - Nick Taylor

Oh shoot, I didn't see — holy. Hey Melky, what's up my man? Thanks for the raid! Yeah, appreciate it. We're just wrapping up, but thanks for joining, everybody in Melky's crew. We've been talking about a project my buddy Anthony Campolo has been working on called AutoShow. Do you want to give a TL;DR as we're wrapping up?

01:25:17 - Anthony Campolo

Yeah, it's an AI tool for processing video and audio content. If you're a YouTuber or podcaster, you can feed it your episodes and it will transcribe them and then pass the transcript to an LLM with a prompt to create a summary, chapters, and descriptions — you can configure all of that to produce different kinds of output. And you should go back to your Claude window to show what it generated for your video.

01:25:45 - Nick Taylor

Yeah. And I know Melky streams all the time and does YouTube as well, so this could be interesting for you if you're curious. It's open source right now and it's going to stay open source. Anthony's going to productize it, but the CLI code is staying open source — and if you want to generate stuff yourself, you just add API keys for whichever LLM service you want to use. I'm definitely going to start using it. Anthony keeps sending me examples like "hey, I took your YouTube video and AutoShow made this," and it's pretty sweet. Hey, Botherin, good to see you!

01:26:25 - Anthony Campolo

Can you go to your browser and show the Claude output real quick?

01:26:29 - Nick Taylor

Yeah, let's switch back. It's going to go Inception because I moved my OBS over. Do you want me to show the content there?

01:26:41 - Anthony Campolo

Yeah, what we got in Claude from the —

01:26:44 - Nick Taylor

Oh yeah, yeah. Okay.

01:26:45 - Anthony Campolo

Did you go back to your web browser?

01:26:47 - Nick Taylor

Yeah. Did I lose my browser? I did. Let me open a new one. Life after Arc. All right. So let's go to Claude. Let me just move it over here. Is this it? Yeah. Okay, so to summarize — if you use the AutoShow CLI, you don't have to use a third-party service to generate show notes. That's all free because you can use Ollama locally with Whisper CPP. But if you want the full breakdown with chapters and proper summaries, you can run it through something like Anthropic with Claude Sonnet 3.5 or whatever. I didn't have tokens set up and of course I just closed the window again. So it's not too bad Edge, but —

01:27:46 - Anthony Campolo

It's like you need to relearn how to internet.

01:27:50 - Nick Taylor

Yeah, exactly. We went into Claude AI instead and I pasted in the prompt it generated, and that created the show notes — and it's pretty cool. So at a high level: you point the AutoShow CLI at a YouTube video, it pulls down the video, strips the audio to WAV format — which Whisper CPP requires — and from there you get your transcript. Then if you want to go further with chapters and summaries, that's when you bring in the LLMs. We ran it first with Ollama and Llama. I forget which model we used — what was the default?

01:28:42 - Anthony Campolo

That was Llama 3.2 3B — one of their new smaller models. And if you use some of the bigger open source models you can still get pretty good output without using paid services. But in general, the best results are going to come from something like ChatGPT or Claude.

01:29:02 - Nick Taylor

Yeah. And when we ran the AutoShow CLI, this is what it generated — this is the prompt that I pasted into Claude. If you use the --claude flag it does the same thing automatically, but I just pasted it in manually to show it in action. You can see the usual prompt engineering stuff: what it wants, the format, the example output. And then you end up with a pretty solid thing — like here, you —

01:29:36 - Anthony Campolo

Just read the first sentence description of the episode.

01:29:39 - Nick Taylor

Yeah, so it says "here's my structured analysis of the transcript." For context, I did a live stream with two engineers from GitHub talking about GitHub Copilot extensions. The summary it generated is: "A developer demonstrates building a GitHub Copilot extension that integrates with Star Search, Open Sauced's AI offering, and showcases how it provides repository insights through natural language queries across different development environments."

01:30:08 - Anthony Campolo

That's a pretty good summary of the episode.

01:30:11 - Nick Taylor

Yeah, I think so. Anyway, it's a pretty cool project. If you haven't heard of it, check it out — I'll drop the link again. And if you want to give Anthony a follow, you can find him all over the web. I think that's a good place to wrap it up. Speaking of raids, let's raid somebody. Nobody I know is on right now, but — Two Nerdy Nerds, sure, why not. Slash raid Two Nerdy Nerds. Thanks everybody for hanging out in the chat today. Thanks again for the raid, Melky, and everyone who joined. Let's go raid. All right, thanks for the follow. Okay, let's raid now. And yeah, thanks, Anthony. If you don't mind staying on for a second — everybody else, appreciate it. Let's go see Two Nerdy Nerds.

On this pageJump to section