skip to content
Video cover art for AutoShow - Automatic Show Notes for Podcasts and Videos
Video

AutoShow - Automatic Show Notes for Podcasts and Videos

Anthony Campolo demos AutoShow, an open-source CLI that uses Whisper and LLMs to automatically generate show notes, chapters, and clips from podcasts and videos.

Open .md

Episode Description

Anthony Campolo demos AutoShow, an open-source CLI that uses Whisper and LLMs to automatically generate show notes, chapters, and clips from podcasts and videos.

Episode Summary

Anthony Campolo returns to the CodingCat.dev podcast to showcase AutoShow, an open-source Node CLI tool he's been building for nearly a year that automates the creation of podcast and video show notes. The conversation traces how the project grew from a personal need for chapter titles on his own podcast, FS Jam, into a full pipeline that downloads audio, transcribes it with Whisper, appends customizable prompts, and sends everything to an LLM for processing. Anthony walks through a live demo of the five-step workflow, explains how he solved Whisper's inconsistent timestamp behavior with a regex-based approach, and discusses the 13-plus prompt templates available — from summaries and chapters to blog posts and even rap songs. The discussion broadens into prompt engineering challenges, the trade-offs of using raw API calls versus SDKs like Vercel AI SDK, and a newer embeddings feature backed by Postgres and Prisma that lets users query their show notes conversationally. Anthony also previews upcoming features like video clipping from chapters and cron-based automation, plus a paid front-end app launching soon while the core logic stays open source. The episode closes with practical AI advice for developers, including recommendations to experiment with multiple models and try running local LLMs through Ollama.

Chapters

00:00:00 - Catching Up and Anthony's Background

Alex welcomes Anthony Campolo back to the CodingCat.dev podcast after a two-and-a-half-year gap, and the two joke about Anthony's earlier Web3 phase. Anthony clarifies that his involvement with crypto actually stretches back to 2017 and continues today, with a potential funding arrangement from the Dash DAO for his current project.

The conversation shifts to Anthony's career path, which has been heavily rooted in developer relations across companies like StepZen, QuickNode, and Edge IO. He describes himself as a devrel at heart who's now operating as a solo entrepreneur, building an open-source project he plans to turn into a paid app launching in April. He mentions Jason Langstorf's concept of repurposing content as a key inspiration behind the tool.

00:03:00 - What AutoShow Does and Why It Exists

Anthony explains the origin of AutoShow by describing the problem it solves: he always wanted chapter titles and show notes for his podcast FS Jam but never had time to create them manually. When ChatGPT launched, he realized AI could generate these automatically from transcripts produced by OpenAI's Whisper, and he began feeding transcript-plus-prompt combinations into LLMs to get structured output.

What started as a manual copy-paste workflow evolved into an automated Node CLI pipeline that handles transcription, prompt selection, and LLM processing in sequence. Anthony details how he expanded support from YouTube URLs to RSS feeds, added multiple LLM and transcription options, created over thirteen prompt templates, and is now building a paid front-end app with a credit-based pricing model while keeping the core repo open source with over 500 commits.

00:08:29 - Live Demo and the Five Processing Steps

Anthony runs a live demo of the CLI, walking through the command syntax and explaining how the tool chains five processing steps: generating markdown metadata, downloading or converting audio, running transcription through Whisper, selecting prompts, and executing the LLM call. He shows the output from a short test file processed with ChatGPT, noting the total cost came to a fraction of a cent.

Alex asks how Whisper works in this context, and Anthony clarifies it runs locally via CLI exec commands rather than through an API. He explains the modular architecture where transcription and LLM steps each branch out to different provider files — Deepgram, Assembly, or Whisper for transcription, and various LLM options for text generation — giving users flexibility in choosing their preferred services.

00:15:46 - Whisper Quirks, Prompt Engineering, and Transcription Details

Anthony describes a significant challenge he solved with Whisper: the tool would sometimes degrade from producing multi-word timestamp segments to single-word segments, ballooning transcript length. His fix forces one-word-per-timestamp output and then uses a regex to regroup into 15-word chunks, keeping transcripts uniform and manageable while reducing unnecessary token spending.

The discussion moves into prompt engineering territory as Alex draws parallels to his own work with Google's Gemini API. Both agree that getting precise, consistent output from LLMs requires extremely detailed prompt specifications — exact character counts, sentence structure requirements, and instructions to avoid formulaic "LLM slop." Anthony shows the actual prompt files, explaining how he instructs the model on summary length, chapter duration, and natural writing style.

00:19:47 - SDK Choices and Working Close to the Metal

Alex asks why Anthony chose raw API calls over higher-level tools like Vercel's AI SDK or Firebase GenKit. Anthony explains he wanted maximum control for learning and fine-tuning, using OpenAI's native JavaScript library for ChatGPT and simple fetch calls for providers like Fireworks. He notes that most LLM interactions in his pipeline are straightforward completion calls, so an abstraction layer hasn't been necessary.

The conversation touches on how providers like Grok and Mistral support OpenAI-compatible endpoints, meaning you can often just swap the model name. Anthony acknowledges that as features grow more complex — particularly around function calling and agents — a unifying SDK might become more valuable, but for now he prefers understanding each provider's strengths directly rather than having a library hide those differences.

00:23:25 - Embeddings, Vector Storage, and Chat With Your Show Notes

Anthony introduces a newer AutoShow feature: generating embeddings from show notes so users can conversationally query their content. He explains embeddings at a high level as converting text into mathematical representations that let LLMs search through custom content beyond their training data, enabling a "chat with your show notes" experience across potentially hundreds of episodes.

The technical discussion covers his journey through storage solutions, starting with Node's built-in SQLite, moving to Better SQLite 3 due to embedding limitations, and ultimately settling on Postgres with Prisma's vector extensions. Alex shares that he's working with Cloudflare's D1 and vectorized database for similar purposes, and Anthony offers to share his earlier SQLite implementation for reference.

00:27:44 - Video Clipping and Content Repurposing Workflows

Anthony demos the Create Clips script, which takes chapter timestamps from generated show notes and automatically clips the source video into separate files matching each chapter. Alex pushes on whether clips can be shorter than full chapters for platforms like YouTube Shorts, and Anthony explains the approach is to simply adjust the prompt to generate shorter chapters — say 90 seconds each — which then become the clip boundaries.

They discuss how this mirrors what large podcasters like Lex Friedman do with dedicated clips channels, and how OpenAI's o1 model has significantly improved adherence to specific chapter-length instructions compared to earlier models. Anthony notes that o1 consistently produces six-minute chapters as specified, whereas older models would sometimes ignore length constraints and produce half-hour segments.

00:31:31 - Function Calling, JSON Output, and Model Comparisons

Alex describes his experience using Gemini's function calling feature to get structured JSON output with specific schemas for chapters and descriptions, rather than raw markdown. Anthony connects this to OpenAI's equivalent function calling capability and explains why he hasn't integrated it yet — his pipeline is optimized around markdown output that doubles as blog-ready content with front matter.

The two discuss the trade-offs between structured JSON output and markdown, with Anthony acknowledging that function calling would enable more fine-grained control. They also touch on the potential role of tools like Vercel's AI SDK and Firebase GenKit in standardizing function calling across providers, and briefly discuss LangChain's complexity versus simpler alternatives like a library called Ragged.

00:37:03 - The AI Landscape, Job Impact, and Getting Started Advice

Alex draws parallels between the current AI boom and the dot-com era, expressing excitement about the renewed energy in tech while acknowledging concerns about job displacement. Anthony agrees there are legitimate criticisms around model bias and AI opinions, but believes developers who embrace AI tools will become productive enough to thrive, while those who refuse to adapt risk being automated out of their roles.

Anthony shares his personal journey from failed music major to AI enthusiast inspired by AlphaGo and Deep Dream in 2016, through a coding bootcamp into web development, and finally back to AI when ChatGPT launched. His practical advice for newcomers is to start using AI for everyday tasks to find that first compelling use case, then experiment with multiple models including ChatGPT, Claude, and Gemini. He also recommends developers try Ollama for running local models, emphasizing the privacy benefits and the importance of building familiarity now before local model capabilities improve significantly.

00:45:56 - Closing Thoughts and Where to Find AutoShow

Alex adds a recommendation for Perplexity AI as a useful tool that aggregates web content, noting he uses it daily with a Pro membership. He encourages listeners to explore AI tools and not fall behind the curve, then directs everyone to check out AutoShow for automatic podcast and video show notes generation.

Anthony reveals that his wife wrote the AutoShow tagline, ending the episode on a lighthearted note. The full project remains available as an open-source repository under AJC Web Dev on GitHub, with the paid front-end app expected to launch soon.

Transcript

00:00:02 - Alex Patterson

Welcome back to CodingCat.dev, where we give you cats the freshest dose of dev snacks. Welcome back, peeps, to the CodingCat.dev podcast. We got Anthony on the show today. What's up, Anthony?

00:00:13 - Anthony Campolo

What's up, man? Good to be back. It's been like two and a half years.

00:00:18 - Alex Patterson

I think it has been a while. Go back and check out that show. You know what? I'm gonna do a quick search.

00:00:25 - Anthony Campolo

You probably don't want to. You caught me in my Web3 phase.

00:00:28 - Alex Patterson

I know it. It's kind of funny because I think you got that fairly quickly, right?

00:00:34 - Anthony Campolo

Yeah, well, I worked for QuickNode, a Web3 company, for a little under a year, but my Web3 phase technically started in 2017. I actually wasn't just like a bandwagoner, and I'm still doing it now. I'm actually trying to get a crypto project to fund AutoShow. Dash is potentially going to fund it from their DAO.

00:00:56 - Alex Patterson

That's amazing.

00:00:56 - Anthony Campolo

Look at my history. It looks like I like just hopped on the bandwagon then immediately hopped off of it. That's not really what happened.

00:01:04 - Alex Patterson

Yeah, I think we had you on a couple times, but yeah, you're absolutely right. It's been too long. So it's exciting that you're kind of working on a new project today.

00:01:13 - Anthony Campolo

You have a new video intro. It's awesome. Little cat.

00:01:18 - Alex Patterson

Thank you. Yeah, I appreciate that. It's kind of interesting. When you go to YouTube these days, it's challenging to figure out if you put an actual intro in or anything else. And so it's always fun, [unclear]. You put that first, you put it in the middle. Where's it go? Does it go anywhere? That's kind of cool because what we're going to talk about today kind of goes in line with all of that too. So, would you consider yourself kind of like a DevRel at heart?

00:01:45 - Anthony Campolo

That's like the only dev job I've ever done professionally. Technically, my very first job was doing DevRel. It's kind of how I broke into the industry. So I did that at StepZen, where I was doing GraphQL, and then QuickNode, where I was doing Web3, and then Edge IO, where I was doing enterprise deployment security stuff. So, yeah, now I'm kind of like a solo entrepreneur because I built an open-source project that's going to be turned into an app that we're hoping to launch in April. But I still am a DevRel at heart, so I still create a lot of content. And the tool itself is not necessarily aimed at just DevRels, but it was built for me to accelerate my own content creation. So if you are a DevRel, it will probably be very useful for you. Jason Langstorf wrote this blog post. It was like shaving the buffalo stick or something like that. There's one of these weird terms from his boss that has to do with repurposing your content. So that's kind of the thing.

00:02:55 - Alex Patterson

I might have to look up this crazy blog post. I haven't read that one. So let's break into it, a good segue. Let's talk about AutoShow, and not mix it up with your car auto show. Just FYI, folks, this is purely DevRel. So the tagline is "automatic show notes for podcasts and videos" with AutoShow. Can you break down why people would use a tool like this and how you're using it to repurpose content like that? We're going to do a whole code walkthrough, folks, so it's going to be a fun AI journey for a little while too. Break that down for me.

00:03:33 - Anthony Campolo

Yeah, so I usually explain what I built it for because it gives you the best idea of what it does. I had a podcast called FS Jam, and one of the things that I always wanted but never really had time to do, because it took me a long time just to edit the episode itself, was go through and add chapter titles. This is a really common thing with big podcasts, like your Lex Fridmans or whatnot. You look at their YouTube video and you'll see timestamps for different chapters that are five to ten minutes each. The chapter title tells you what they're talking about in that section. So if you see a three-hour podcast interview and you're like, "I don't really want to listen to this whole thing, but I'm really curious what this person has to say about the New Jersey Jones," or something really specific and random, AI can do this. Around when ChatGPT came out, I got very into that and I was like, wow, this is amazing.

00:04:32 - Anthony Campolo

You could do all this stuff. I was also getting into OpenAI's Whisper tool, which was the last open-source thing they released. They're talking about bringing open source back, so we'll see about that. They had this transcription tool so you could transcribe your video, and it could also be used for captions and stuff like that. So there were timestamps and transcriptions. What I did was feed that to ChatGPT and write a prompt that told it what I wanted. I would ask for a summary and chapter titles and descriptions, and then it would write out the show notes for you. I was like, wow, this is really amazing and really useful. It saves me a ton of time. Then my brain started working, and I'm like, okay, this is still a lot of manual effort, though, to do the transcription, write the prompt, put it into ChatGPT, and get that back. I started creating this automated pipeline where you have a Node script, or Node CLI, that runs a sequence of scripts. It does the transcription, appends the prompt, feeds that to an LLM, and then gets the response back.

00:05:46 - Anthony Campolo

So that's kind of the first MVP thing, and then from there it expanded out. It started with just YouTube, where you'd literally take a YouTube URL, give it to your CLI, and it would do the whole thing. Then I started adding in things like podcast feeds, so I had to parse RSS XML stuff. Then I added other LLM options, other transcription options, and more prompts. So if you want to write a rap song based on your episode, you can now do that as well. Or if you want to write a blog post, or you wanted longer chapter descriptions, or just chapter titles, there are like 13 or so prompts now. I just kept adding to it and adding to it, and it kept being able to do more and more things. The last thing was, now it needs a front end. So I'm currently working on a front end so you don't need to know how to use a CLI to do it.

00:06:48 - Anthony Campolo

And that's going to be an actual paid product where people will sign up, pay for credits, and use those credits to generate show notes. The episode length and the quality of the LLM you want to use will affect how many credits it costs.

00:07:05 - Alex Patterson

And then you're going to give it all to me for free just so I can be a tester, right? That's how this works.

00:07:12 - Anthony Campolo

The first couple of people will get a promo code to have a couple free runs. So yes, actually, that is true.

00:07:19 - Alex Patterson

Very cool. So I think I poked around on the repo. Are you going to leave it public for folks in the future as well?

00:07:28 - Anthony Campolo

The repo is going to stay open source forever. What we are going to do is this: the repo itself does have a front end, but it's very, very rudimentary. The front end we're building for the paid app is a separate repo that's not going to be open source. So you won't be able to just clone my whole company. But the logic behind it will stay open source, and if someone wants to use the functionality without paying for the app, and they just want to use their own OpenAI API keys and stuff, that option will always be available. I'm a big open-source person. I always have been, and I've been building this in open source for almost a year now. If you look at the first commit, I think it was in April of 2024.

00:08:13 - Alex Patterson

That's awesome.

00:08:14 - Anthony Campolo

Yeah, it's the most open source I've ever done and it's got like over 500 commits on it and like 50 stars or something. So, yeah, AJC web dev autoshow.

00:08:29 - Alex Patterson

Yep, and we got it. I got it in the show notes, and it'll be in the blog if you miss it anywhere. Sweet. So with that said, that's kind of the gist of it all. I want to see this thing work. Are you prepared to show off some code?

00:08:44 - Anthony Campolo

Yeah, absolutely. I'll do a run first before we actually look at the code itself so people can see that. So let's do this. When you're using the CLI, you have npm run as for AutoShow and then --, and then you can do file if it's a local file or video if it's a YouTube video. And I've got this test file here.

00:09:15 - Alex Patterson

I've always wondered, like, what's the extra dash dash for?

00:09:19 - Anthony Campolo

So that's because it's running a script. It runs this, which is running [unclear]. So essentially, when you have an aliased script that you want to pass additional flags to, that's how you do it.

00:09:38 - Alex Patterson

You pass it through. Okay, got it. Cool.

00:09:40 - Anthony Campolo

Yeah, yeah. I learned that doing AutoShow.

00:09:45 - Alex Patterson

I've been doing some CLI fun at work, and there's a plugin called Chalk that you can use to do questions and colorize stuff like that.

00:09:55 - Anthony Campolo

I use chalk as well.

00:09:56 - Alex Patterson

Ah, there you go. It's a very popular Node-based one.

00:09:59 - Anthony Campolo

So yeah, when we get the output here, we're going to get a bunch of Chalk output. If you don't include an LLM, it will give you just the transcript and prompt, and you can give it to your own LLM, like feeding it to a UI or chatbot. But if you do want it to do the whole process in one go, then you can tell it what to do: ChatGPT, Claude, etc.

00:10:25 - Alex Patterson

So talk to me about that for a minute. Is that because you have an LLM installed locally, and that's how it's able to do the first part?

00:10:36 - Anthony Campolo

Well, no. So there are two things. If you leave it out, it will just give you the prompt and transcript, so it doesn't do an LLM run at all.

00:10:48 - Alex Patterson

So how is it actually pulling the transcript out?

00:10:50 - Anthony Campolo

So the transcript is through Whisper.

00:10:51 - Alex Patterson

Ah, okay.

00:10:52 - Anthony Campolo

Yeah. And that's probably why you're having trouble getting it working locally. If there's a couple things that could have been challenging about getting this to run locally, there's a massive startup script.

00:11:04 - Alex Patterson

I was messaging Anthony before the show.

00:11:07 - Anthony Campolo

I said it's a dangerous startup script. Yeah, that's a little intense in terms of the LLMs. Let me just run this, and then more things will start to make sense. It runs through five processing steps. The first one generates the markdown. Since I'm doing a local file, there isn't a whole lot of markdown. When you do a YouTube video, all this stuff gets filled in automatically. Then it does a download-audio step. So if you already have the file on your machine, it just converts it to WAV. Otherwise, it will actually download the YouTube video itself. Then it runs transcription. It looks for a Whisper model. I didn't select any, so it just uses the base model. Then it selects the prompt. There are two default prompts because we didn't select any prompts, so it gives you a summary and chapters automatically, and then it runs the language model. Since we selected ChatGPT but didn't say what ChatGPT model we wanted, it gave us one automatically. The total cost was 0.05 cents.

00:12:19 - Anthony Campolo

These are cents, not dollars. It's like a minute long video. And then those are all the process steps. Then if we go here, so forgive

00:12:30 - Alex Patterson

my naivete already, the Whisper part of this is not calling out to an API at all, right? You have Whisper. Talk through how that part works.

00:12:45 - Anthony Campolo

Okay, yeah, sure. So first, here's what it gave out: "Anthony and a guest reflect on the future of JavaScript and FS Jam frameworks, discussing trends, challenges, and their experiences from the past year." This is just a short clip for my podcast, FS Jam. And then down here is the actual transcript.

00:13:03 - Alex Patterson

That's incredible. Out of like a one-minute video, it's able to do that. That's so cool.

00:13:09 - Anthony Campolo

Now let me break this down. I walked you through the five steps. Each of those five steps has its own specific file: generate markdown, download audio, run transcription, select prompts, and run LLM. You were asking about run transcription, so let's look at that file. It's going to figure out which transcription service you want. There are two paid ones you can use, Deepgram or Assembly, and those give you higher-quality transcriptions and things like speaker labels. Then Whisper is here. Run LLM also gives you a bunch of different LLM options. It's built in a modular way where you have these five steps, and then two of the steps, run transcription and run LLM, reach out to either transcription files or LLM files. So this is where the Whisper stuff happens. Literally, this is why I liked building this as a Node CLI, because there are JavaScript versions for a lot of tools I'm using.

00:14:30 - Anthony Campolo

There's a JavaScript wrapper for FFmpeg and for Whisper, and I tried all of those, and they weren't really that great. What I ended up doing is just having Node execute process commands, or exec commands. It throws in the Whisper command you would run in the CLI. So you see here it points to the Whisper CLI, selects the model, gets fed the path of your file, and then the path of the output file. Then I have it do this. This took me a really long time to get right. Whisper does this weird thing where sometimes it starts doing one word per timestamp. It starts off doing like ten words, then slowly less and less, and then your transcript ends up being like 10,000 lines long. So I forced it to only do one word per timestamp, then wrote a regex to regroup into 15 words per timestamp automatically. If you look at the transcript now, it's very uniform.

00:15:46 - Anthony Campolo

And that seems to work out pretty well because then it makes sure that no matter what, it's not going to be longer than like a thousand lines or something. And you don't get a whole bunch of extra timestamps polluting your prompt because then you're spending more on input tokens as well.

00:16:02 - Alex Patterson

So that's pretty much what's happening here. And the only thing I can compare to is we're in the same world, so we're trying to do the same thing essentially. And there's a couple products that are out there already. And so, like, with all of that said, I've been trying to throw it through Gemini and kind of do the same thing. Right. But the interesting part on that side of things is you have to kind of start to specify like, I want it in this JSON format. Here's a function that you can run, and here's what I want back. And the cool part or the interesting part when you start to break those down is like, I notice on your regex, it kind of cuts off, right, like part of that, that summary statement. So you have to be like very specific to say, I want a full sentence that has this summary only at every 1 minute to 20 minute length. Like you have to get so detailed. And I feel like that's the. What we kind of are calling prompt engineering now. That's where that starts to come into play.

00:17:02 - Anthony Campolo

Yeah, let's look at the prompts since you just mentioned it. We have all of our prompts here. The one we were just using is summary, and each prompt has a description of what it does. I told it to write a one-sentence description of the transcript and a one-paragraph summary. If we look back at what we had, you see there's this short description here. I usually throw that into my meta description because it's about the right length, and then the longer summary. So I say write a one-sentence description and one-paragraph summary, and I tell it specifically how long it should be. I say the one-sentence description shouldn't exceed 180 characters, roughly 30 words, and the one-paragraph summary should be approximately 600 to 1200 characters, roughly 100 to 200 words. Depending on which LLM you use, some of them follow that more closely than others. As the LLMs are getting better and better, like o1, it really sticks to what you say, especially for chapter descriptions. Then I give it an actual example.

00:18:21 - Anthony Campolo

Write a one paragraph... wait, no, sorry, that's the long chapters one. So: create chapter titles and descriptions based on the topics discussed throughout. Include timestamps for these chapters. Chapters should be roughly three to six minutes long. Write a two-paragraph description for each chapter. Ensure chapters cover the entire content. Note the last timestamp, and let descriptions flow naturally from the content. Avoid formulaic templates, because what it was first doing was writing each chapter in a certain way, that kind of "LLM slop" term people use, where you can tell something was written by an LLM. Even though it's grammatically correct, there's just something to it that's hard to put your finger on.

00:19:10 - Alex Patterson

Totally. Yeah.

00:19:11 - Anthony Campolo

Yeah, that's.

00:19:12 - Alex Patterson

That's interesting and like, to your point, like, you have to get this, like, so perfected for like, what you're trying to accomplish and like exactly what you're asking for and things like that. I think that's the biggest challenge when you're starting to use some of these models. I am curious though, as you kind of dive into this side of it on using Whisper and stuff like that, there's tools like Vercel SDK or AI SDK and there's Firebase, genkit and things like that. I don't see that in this code base. Is there any reason that you kind of just did it more of a raw way of taking care of the different calls and what was that like?

00:19:58 - Anthony Campolo

Yeah, I wanted to have as much control as possible because at first I was just building this for myself. I wanted to get as deep into the tools as I could, partly for learning and also so I could really tweak them. We haven't looked at the ChatGPT file yet. You're correct that I'm not using Vercel SDK, although I have used that for an agent-focused project I wrote a blog post about recently. But for these, I use OpenAI's built-in JavaScript library. For most of these, I use either their library or, for some like Fireworks, I literally just do a straight fetch call to the API. That's nice because you know exactly what you're getting and what you need to do, and then you can standardize it. I'm starting to think more about standardizing around the OpenAI interface, because I had a Grok implementation at one point.

00:21:07 - Anthony Campolo

I actually. This used to be. I used to have like 10 LLMs and I picked like four of them, which I thought kind of suck, and then move them to a doc that explains how to use them if you want. So if you go to the Grok one, this one is like, you can use the OpenAI's library to use Grok and you could do the same thing with like Mistral and other things.

00:21:27 - Alex Patterson

Yeah, I found that interesting. I was reading through because Grok 3 just came out and everybody's hyped about it, which is honestly kind of frightening to me, what's coming out of it. But with that aside, it basically says, "Hey, if you're using OpenAI, just swap out the model and you're good to go." That took me a minute to really understand. They're utilizing all the SDK parts for OpenAI with their own model. It makes me wonder, though, when we start to get into agents and things like that, where it's almost workflowed out, if it makes sense to use something more featureful, like Vercel's AI SDK or something along those lines. Do you find that's not necessary, or have you had enough experience to make that determination yet?

00:22:27 - Anthony Campolo

It hasn't been necessary because for what I'm doing, the LLM part is one step in a larger workflow, and all it's been doing is a single completion. They all have a completion API and they all work similarly. You hit an API, send it text, and get text back. That's been very simple and easy to do. Now I'm starting to work with embeddings. You can take all of your show notes, pass them to OpenAI's embeddings endpoint, and then query them. So it's like "chat with your show notes," a common feature a lot of these AI products are starting to do and have been doing. Around the beginning of 2023, everyone did "chat with your docs." You probably remember that.

00:23:25 - Alex Patterson

Yes, but talk to me about embeddings because myself, I don't know that much and like how they work and I think others are probably in the same boat.

00:23:34 - Anthony Campolo

Yeah, embeddings, the technical definition is crazy. It's like taking text and turning it into math so the LLMs can understand it. That's the most high-level description I can give. What it does is let you go beyond the model's knowledge base, because they're trained up to a certain point. Some can reach out to the internet, some can't. But if you want to ask questions, going back to "chat with your docs," and you're working with a brand-new library that was written two months ago, most LLMs aren't going to have knowledge of that. So you can take all the docs, turn them into embeddings, and then ask a question. Because the LLM can quote-unquote read the embeddings, it can search through them and give you an answer tailored to that content specifically. So it's cool if you have show notes, if you have 100 podcast episodes, and so on.

00:24:46 - Anthony Campolo

If you turned all those into embeddings, you could ask a specific question about an episode you did back in 2022. Like, ask it about my blockchain episode and say, "What did we talk about on the blockchain CodingCat episode?" It will find that episode and give you a response because you created the embeddings.

00:25:08 - Alex Patterson

Where would you place those embeddings? Would you create your own Postgres file or something like that to store that into?

00:25:14 - Anthony Campolo

Yeah. So if you look here, I have to change this. It says right now create embeddings and SQLite. Because I first was using SQLite, there is an embedding SQLite plugin that's like very, very new.

00:25:25 - Alex Patterson

I actually love that because I'm doing all my stuff on cloudflare. So for their vectorized database, it's going to be on that.

00:25:32 - Anthony Campolo

Yeah, maybe part of the thing is because it's new, it can be complicated. I was using Node.js built-in SQLite, but there was a limitation with it, or a limitation with embeddings. It just didn't work. So I started using Better SQLite3 instead. You're probably going to hit stuff like that. I eventually moved off SQLite, and now I'm using Prisma and Postgres. I'm sure you're familiar with Prisma. It's using their vector extensions and creates embeddings here. Now it's all going into a Postgres database. It can be done with SQLite, though. The first implementation did everything with SQLite.

00:26:17 - Alex Patterson

I'm going to have to try it because I have my D1 setup for Cloudflare. I'm curious if like the extension will still support this or if it's a different extension. But I love the fact in Prisma you can kind of do this style of setup for sure.

00:26:36 - Anthony Campolo

Yeah. If you check my git history, or I can find it and send it to you, I can send you what these files were back when they were SQLite so you can see what I was doing.

00:26:45 - Alex Patterson

Cool, thanks. Yeah. So we've talked a little bit, you know, how you break it down in whisper and like how you can kind of send it out and how to create embeddings and things like that. I'm curious from an overall like structure perspective, would you take it? For me there's, there's two parts to kind of our content creation process. There's the upfront cost of like creating content where you need like images and like video outlines and things like that. But then there's kind of the post production side which, which you're dealing with on this project. I'm curious if you would ever be able to take this to the next level of like creating videos out of it. So if you take like one of my 45 minute long podcast like we're doing here and I just wanted to break it up into shorts and create something, how would you kind of go through that process mentally and could you do it with what you have now.

00:27:44 - Anthony Campolo

Yeah, so there's this one right here. This one's fairly new. This is the Create Clips script. What this does, which is super cool, is once you have show notes like this, you see these three chapters here. Technically these timestamps are not correct because the video is only one minute long, but usually if you're working with substantial content, these are correct. So what the Create Clips script does is find these transcripts, clip them, and give you three videos with these three titles.

00:28:23 - Alex Patterson

And would you specify like a length of time for those videos?

00:28:28 - Anthony Campolo

So the length is based on the chapters.

00:28:31 - Alex Patterson

That's why I'm wondering, if that first one is three minutes long, I don't know that I would want to have a clip that matches chapters per se. I might want five clips within the chapter, right? So if this whole thing can be a chapter on video clipping, but I only want 90 seconds apiece so I can spin them out to YouTube, is there a way to say, "Hey, take this chapter and break it down to clip clips accordingly?"

00:29:00 - Anthony Campolo

The way you would do that is just have it create shorter chapters in the first place. Create 90-second chapters and then clip all of those. The reason it matches chapters is because that's what you see on podcasts like Lex Fridman. There'll be a Lex Fridman channel or Joe Rogan, then there'll be clips. Basically, they break up the episode into different chapters, and each video is however long. Some clips from shows will be up to 15 minutes long because that's more of a YouTube clip. You're talking more about TikTok and Shorts.

00:29:40 - Alex Patterson

Shorts on like YouTube, like not 16 by nine, it's nine by 16, right? Like shorts on YouTube can only be up to three minutes now, I think is the limit. So it's kind of. Yeah, that scenario. I know a lot of people are looking to like break theirs up to release as full videos too though, which makes sense to me. If you do kind of a longer, longer chaptering, for sure.

00:30:04 - Anthony Campolo

Yeah. For your use case, I'd just try changing the prompt to get really short chapters. Then you'd have maybe 20 clips from an episode, however many it ends up being. If it's a half-hour episode, it could be 15 to 20 clips. That would give you something similar. For me, I'm looking more for slightly longer things, like 5 to 10 minutes. The way the prompt is written right now, if you give it to o1, it will usually make all the clips exactly six minutes, which is not always perfect, but it's a fairly good chunk of time. Then it gives you a meaty description of each chunk, so you can read through it and get a good idea of everything that happens throughout the episode, and it gives some uniformity. When I first started doing this with older ChatGPT models or Claude, if I gave it a two- to three-hour episode, it might give chapters that were like half an hour long.

00:31:13 - Anthony Campolo

Even if I specifically said these chapters should not be longer than 10 minutes, it would still do it because it couldn't give you that fine-grained amount of control. But o1 actually follows what you write to a T, so it gives you exactly six-minute chapters, which is kind of wild.

00:31:31 - Alex Patterson

Is there any advantage, or have you tried going to newer models like o4 or whatever's out? I'm so bad with model names.

00:31:39 - Anthony Campolo

Well, okay, so here's the thing. O3 mini is the only thing that's out right now. The new model is not here yet. They give you the crappy version of the new model and act like it's a cool new thing to use. O3 mini is no better than 01, it's just faster. It's essentially equivalent. So as soon as O3 proper drops, yeah, I'll be using that.

00:32:00 - Alex Patterson

Day one I want to talk a little bit to like my most experiences on Google Gemini, not Vertex, which I just had a meeting with Google yesterday and I'm like, your branding is just so rough around these things. But anyways, if you're looking into that folks, Vertex is kind of the enterprise grade level of that. Gemini is more of the open API that you can hit. With that said, I'm curious, they have things that are called functions and each function can run off and kind of do different things. So like I have a function that does a description, I have a function that does a summary, I have a function that does chapters. I'm curious if that's something that you've run across or are kind of maybe manually doing just the way you have your code split out.

00:32:50 - Anthony Campolo

So I'm not very into the whole Gemini world. I actually have a paid Gemini account. Every now and then when I want to compare across different models, I'll usually hit ChatGPT, Claude, and Gemini and then look at all three. What you're describing, I'm not even sure how that maps to the other models because the term function is so overloaded. There are functions in OpenAI, but it's definitely not what you just described. Function calling is a whole different thing, unless you're talking about getting JSON output.

00:33:25 - Alex Patterson

It does when you. So you can, you can get JSON output without it, but with functions you can specify. I'm going to get the term wrong. But a model essentially a like a framing of what should come back. So for chapters I put in function and I'd say, you know, I would like my chapters and I would like them with this specified time and I want hour, minute, seconds and blah blah blah. Like I specify all of that in the program prompt. But then I also say here's what the JSON should look like and that way I can get like time description and all that stuff. And it's not in markdown when it comes out. It's purely in a JSON format that I specify.

00:34:05 - Anthony Campolo

So yeah, this is the same as OpenAI's function calling. I talked to Dev Agrawal about this on one of our streams because that would be useful and is something we'll likely integrate at some point, since it can give a more fine-grained output. Like you're saying, if you get it in JSON, you can do a lot more with it versus getting a markdown string that's a bit unwieldy. The reason I haven't done it yet is that the way the tool is set up, the thing you want at the end is a markdown file, especially because it already includes front matter and you can use it as a web page if you want. So I run all my YouTube videos and podcasts through this and upload the output on my blog. Now I have like 200 blog pages of every video and podcast I've ever done, and it's just like an

00:35:00 - Alex Patterson

MDX file for you.

00:35:01 - Anthony Campolo

Essentially it's an MD file. But yeah, you could have it do MDX if you wanted. Function calling is interesting. This also gets into how, as you add more and more features, something like Vercel's AI SDK makes more sense because it might homogenize those differences. I'm not sure if that also gives you the same function ability across OpenAI versus Gemini or not.

00:35:30 - Alex Patterson

Yeah, I haven't played with that, but I'd be curious for sure. I think Genkit does have that built in, but I think they're so focused on basically it's Firebase products, but it's an open library that you can call and I think they're so specifically trying to make sure you can hit Gemini and everything else that they're covering that use case as well.

00:35:53 - Anthony Campolo

Yeah, a buddy actually built something called Ragged, which was meant to be kind of like a LangChain replacement. He doesn't really keep it up to date, so it's probably not something you should really use. But we did an episode about it a while ago, and he built something super simple where you basically have a single function, pass it text, and then it does your LLM call. Have you used LangChain at all?

00:36:19 - Alex Patterson

No.

00:36:20 - Anthony Campolo

Really complicated. Yeah, it's super complicated. I don't really recommend it to people. But going back to your other question, I really wanted to stay as close to the metal as possible with different LLMs because you want to know what each one offers. As a developer, I think you want a clear idea of where the value of different ones is. Having a library where you just change the model name is nice, but it hides too many things from you, especially in the fast-moving, complex AI environment we're in.

00:37:03 - Alex Patterson

Yeah, it's like the Wild West right now. It's fantastic. I'm old enough to have gone through the dot-com boom and bust and all that fun stuff, and I have to say it feels like that again, which is exciting to me. It has kind of revitalized things, in my opinion. It's a fun time to be doing this stuff. I think a lot of people see the negative side, with people losing jobs and things like that, but I think there's a lot more opportunity too. It's just a mindset and a gear shift for us.

00:37:35 - Anthony Campolo

Yeah, I think there are plenty of criticisms to have, especially when you get into model bias and trying to research certain topics where ChatGPT has opinions about things. If you ask it about certain topics, it will have an opinion, and it really shouldn't. So I do think a lot still needs to be worked out. I'm not as worried about job-loss stuff because I think a developer with AI is going to be so productive they'll find a way to make it in the industry. But if you're doing certain roles and refuse to adopt AI, you might get automated out of the job.

00:38:21 - Alex Patterson

Yes, agreed. Or you might have to pivot to a different role completely because your job is getting automated out. Definitely something interesting. Before we completely shift off, was there anything more specific you wanted to show with AutoShow itself?

00:38:42 - Anthony Campolo

Really, what you see is what you get, in the sense that the workflow is pretty standard whether you're using different LLMs or different transcription services. We talked about the different prompts that are available, so I think you have a pretty good idea of what the tool does at this point. Embeddings are a fairly new feature. Some of the things I'm working on now are cron jobs and related automation. So if you have a show you're following and want to always get new episodes, you could run a cron job that checks every day, or have a subscription that watches it and runs automatically whenever something new comes in. Then you can have a constantly running setup for shows you're watching, or for your own content, where it auto-generates stuff after you're done going live. Those are the types of things I'm working on now, plus the front end so non-devs can use it.

00:39:51 - Alex Patterson

Nice. Very cool. Any last minute thoughts about AI in general or suggestions for folks getting started with stuff?

00:40:01 - Anthony Campolo

Oh, I mean, plenty. Pretty broad, right? I started using ChatGPT day one when it came out. I remember very clearly it was November 30, 2022, the day that changed my life.

00:40:13 - Alex Patterson

Smokes. That's very specific because I just remember

00:40:17 - Anthony Campolo

I got into coding because I wanted to do AI. A long time ago, back in 2016, there was a big AI hype cycle because of AlphaGo. There was this program that could play the game of Go, and lots of people heard about this, even non-technical people. It's kind of like when Gary Kasparov was beaten by Deep Blue in chess back in the '90s. That was an event like the moon landing for some people. It was a really big deal. So the AlphaGo thing was another big moment like that. You also had something called Deep Dream, one of the first image models from Google, creating these weird psychedelic images where you'd have a field and see little animal faces kind of popping out of it. I was so fascinated by that. At the same time, I was a failed music major living with my parents and wanting to change careers. So I first got into coding to learn Python, machine learning, TensorFlow, and all that.

00:41:24 - Anthony Campolo

And it was so challenging. It was so hard, and after years of failing at that, I was watching all these YouTube tutorials and thinking, for every one TensorFlow video, there are ten "Learn React" videos. Maybe I should just learn this React thing. Then I did a bootcamp and got into web dev, and I was a web dev professional for years and years. Then ChatGPT happened, and I'm like, okay, AI has actually arrived now. That's what's different now versus 2016. In 2016, if you weren't a PhD mathematician, you had no chance of doing anything useful in AI. It just wasn't going to happen. Today, the tools exist and even non-coders can use ChatGPT for all sorts of things. My wife uses ChatGPT every single day now because I've gotten her into it. She was very reluctant at first. She really did not like it at first. But then I started showing it to her, like you could do this with this, you could do that with this.

00:42:26 - Anthony Campolo

And she started saying, "Oh, okay, that's actually really useful." So the first thing I would say is just use the thing for basic everyday tasks or problems, no matter what it is. Common ones people talk about are recipes, writing emails, or summarizing emails. There's that funny joke that now everyone is taking a bullet point, turning it into an email, and then everyone is taking that email and turning it into a bullet point. So the biggest advice I would give people is to find one thing that is useful for you. As soon as you find that one thing, that's the window to everything else.

00:43:13 - Alex Patterson

It'll make you want to dig deeper and learn more and kind of grow from there. So great advice.

00:43:18 - Anthony Campolo

And then I would definitely say try different models. There are a lot of paid models now. If you want to use o1 Pro, you're paying 200 bucks a month, which is pretty steep. But I recommend looking at at least ChatGPT, Claude, and Gemini, especially since you're saying you're really big into Gemini.

00:43:39 - Alex Patterson

Gemini is completely free right now still. So like you could probably use that for quite some time. I think they're trying to just get everyone's heads wrapped around it. And so from every indicator I get, you're looking at a good year of free usage right now.

00:43:52 - Anthony Campolo

So interesting. Well, there is a subscription, so there is something you're getting for that.

00:44:00 - Alex Patterson

It's actually apples to apples right now. From everything I've been told, it's kind of fascinating to me. If you want Vertex and you want highly scalable, enterprise-grade, you know, four nines or whatever it's up to now, that's Vertex on the Gemini side. With the Pro plan, there are paid pieces to it, and that's basically giving you support. So it is free to use, though, with that.

00:44:28 - Anthony Campolo

Oh yes, one more thing for the devs out there. My last piece of advice is try out Ollama. We didn't show this in the demo, but Ollama is an open-source way to run models. The big problem with this is that most people are not going to be able to run really powerful models on their current machine, and that's a huge issue. You'll be able to run smaller models that can do some tasks kind of okay. But I think this is going to change, and within the next couple of years, the models people can run on lower-end machines will get pretty good. So you want to build up this knowledge base because it's going to be a really big deal once we get there. Ollama is a really cool open-source library that makes it simple to run a single command, like ollama run deepseek-r1, and it gives you a chat interface right there in the terminal. You start talking to it, you get responses back, and that's just running on your machine.

00:45:34 - Anthony Campolo

There's no API service, and no one else is seeing your conversation. You can have pretty personal conversations with these chatbots. Some people are using them for therapy and all sorts of stuff, so you might want a local model for that reason. Then you can also start to see what is happening behind the scenes with these models. So that would be my last piece of advice.

00:45:56 - Alex Patterson

Awesome. You sparked one for me too. It's probably less on the developer side, but Perplexity AI. Check that out as well. It essentially scrapes a lot of websites into one experience, and that's a really cool feedback loop. I use it every day. I have a Pro membership on that one. So definitely dive into AI. I don't want you guys to get left behind. Check it out, and go check out AutoShow: Automatic Show Notes for Podcasts and Videos.

00:46:32 - Anthony Campolo

Yeah, that's my wife. She wrote that tagline.

00:46:34 - Alex Patterson

Love it. See it.

On this pageJump to section