
Creating Music with AI
Anthony Campolo demos AutoShow V2, a tool that transforms podcasts, videos, and books into AI-generated music, lyrics, and multimedia content.
Episode Description
Anthony Campolo demos AutoShow V2, a tool that transforms podcasts, videos, and books into AI-generated music, lyrics, and multimedia content.
Episode Summary
Anthony Campolo joins Nick Taylor to showcase AutoShow V2, an evolution of his open-source tool that originally automated podcast chapter titles and summaries using Whisper transcription and LLMs. The new version expands far beyond text, now supporting music generation, text-to-speech, image creation, and video output from input content like audio, video, PDFs, and ePubs. Anthony explains how he integrates services like ElevenLabs and MiniMax for music generation, since his preferred platform Suno lacks an API. He demonstrates the full pipeline — from feeding a podcast transcript through an LLM to generate lyrics, then passing those lyrics to a music model to produce complete songs in various genres. The conversation touches on the rapid improvement of AI music models, with Anthony noting that Suno's instrumentals became indistinguishable from human-played instruments around version 4.5 and vocals reached a similar quality at version 5. He also demos a custom lyric video generator he built and discusses a concept album feature that splits ePubs into chapters and generates a song for each section. Along the way, both hosts reflect on the broader state of AI-assisted coding, comparing tools like Claude Code and OpenCode, and discussing how vibe coding has matured into a workflow that requires intentional refactoring and code maintenance practices.
Chapters
00:00:00 - Introductions and Anthony's New Role
Nick and Anthony catch up after the holidays and exchange New Year's stories before diving into Anthony's professional update. Anthony describes his new position as a collaborative engineer contracted through a company called Atmosphere, where he runs two-week crash courses teaching enterprise development teams how to effectively use AI coding tools like Copilot. The role focuses on practical applications such as managing technical debt, writing documentation, and filling in missing tests rather than feature development.
Anthony draws parallels between this work and his background as a former teacher, noting that the role feels more like education than traditional developer advocacy. He explains that his day-to-day involves workshops, lectures, guided exercises, and office hours rather than creating outward-facing content, and he's been enjoying the shift for about three months since starting in November.
00:04:22 - AutoShow Origins and V2 Overview
Anthony provides the backstory of AutoShow, which began as a CLI tool to automatically generate chapter titles and timestamps for his podcast episodes using Whisper transcription and LLM processing. He quickly realized the same pipeline could produce summaries, blog posts, song lyrics, social media content, and more by simply swapping out prompts. The tool evolved from a developer-focused CLI into an app his wife now uses regularly for work tasks.
AutoShow V2 represents a major leap forward, accepting not just audio and video but also PDFs, ePubs, and PowerPoints as input, and outputting across five modalities: text, text-to-speech, images, music, and video. Anthony outlines his vision for generating concept albums from books by splitting ePubs into sections and running each through the music generation pipeline, and he discusses how the wow factor of AI music generation still surprises people who are mainly familiar with ChatGPT's text capabilities.
00:09:39 - AI Music Quality and the Suno Evolution
The conversation shifts to the quality and perception of AI-generated music. Anthony traces his interest back to around 2018, recounting how he played an AI-generated Beatles-style song for a music professor who dismissed it as a poor imitation. He then charts Suno's progression through six model versions, identifying version 4.5 as the inflection point where instrumentals became indistinguishable from human performances and version 5 as when vocals reached comparable quality.
Anthony explains his approach to generating interesting music by experimenting with specific genre combinations rather than generic prompts, and describes how AutoShow's pipeline adds value by generating lyrics from podcast transcripts through an intermediate LLM step rather than creating lyrics from scratch. He discusses ElevenLabs and MiniMax as the API-accessible alternatives he's integrated since Suno lacks a public API, and notes the cost trade-offs between subscription-based services and per-minute API pricing.
00:19:00 - Use Cases, Target Audience, and Business Considerations
Nick and Anthony discuss who AI music generation is actually for, with Anthony clarifying that AutoShow targets content creators and casual users rather than serious musicians. He points to examples like the Latent Space podcast creating theme songs for each episode and notes that AI artists are already charting on Spotify. The discussion touches on how musicians could use tools like Suno's studio features for collaborative human-AI music creation, though AutoShow focuses on end-to-end generation requiring no musical input.
The pair also explores the business dynamics of AI music platforms, speculating on why Suno hasn't released a public API despite the demand. Anthony discusses the economics of music generation, noting that Suno offers incredible value at around a hundred songs for ten dollars on subscription while ElevenLabs charges roughly eighty cents per minute through its API. They agree that now is the time to experiment while pricing remains relatively accessible.
00:26:51 - AutoShow V2 App Demo and Pipeline Walkthrough
Anthony shares his screen to walk through the AutoShow V2 interface, demonstrating the streamlined form-based workflow that replaced the more complex background processing approach of V1. He steps through each configuration option including transcription services with speaker labels, LLM model selection across OpenAI, Claude, Gemini, and Grok, text-to-speech voices from multiple providers, image generation with different dimensions, and music generation with genre selection.
The demo reveals the end-to-end pipeline in action, showing the job-based processing system with progress tracking, background execution, and planned email notifications. Anthony highlights ElevenLabs' composition plan feature, which allows specifying verse and chorus sections with different moods, and encounters a live bug with Sora video generation where unsupported dimensions caused a failure. The conversation briefly detours into domain names when Anthony mentions securing the auto.show URL.
00:41:50 - Domain Stories and Hardware Tangents
Nick shares an entertaining story about selling the html5.ca domain to Microsoft for four thousand dollars after buying it for ten, using the proceeds to purchase his first Mac. The conversation meanders through personal computing history, iPods, and current hardware setups including Nick's Minis Forum mini PC running Claude Bot and Anthony's plans for a Linux box. They discuss Claude Bot use cases, with Nick describing how he used it for planning and drafting a pull request.
The hosts touch on trending AI developer tools including the RALPH loop and its potential expense due to autonomous but inefficient processing. They compare their various AI subscriptions, with Anthony noting he has ChatGPT Pro, Claude Max, and OpenCode Black, while discussing how services like OpenCode and Goose allow connecting multiple subscription keys. Both agree that current pricing represents an experimental window before costs normalize.
00:53:39 - Lyric Videos, Concept Albums, and Music Demos
Anthony demonstrates his custom lyric video generator, which takes an image, lyrics file, and audio to produce a scrolling lyric video that he considers far superior to Suno's built-in option. He shows the CLI-based music generation workflow, playing examples from both MiniMax and ElevenLabs, with the electronic dance track from ElevenLabs drawing particular praise from both hosts. The demos illustrate the range of quality across different genres and providers.
The segment concludes with Anthony showing the ePub splitting feature that forms the foundation of his concept album pipeline. He demonstrates splitting a book of short stories about the afterlife into five text files in a fraction of a second, explaining how each section would eventually feed through the lyric and music generation pipeline automatically. He notes this scaffolding still needs to be wired together but the individual pieces are all functional in the open-source repository.
01:01:19 - Vibe Coding Practices and AI Development Reflections
The conversation pivots to broader reflections on AI-assisted development workflows. Anthony describes his evolution from pure vibe coding to a more disciplined approach, advocating for periodic "vibe refactoring" sessions using prompts that check for unused variables, dead code, and convention inconsistencies. He credits his workshop teaching experience for sharpening his understanding of code maintenance patterns and notes that AutoShow V2 has a much cleaner codebase with better test coverage.
Nick shares his experience migrating a Create React App project to Vite using Claude Code in roughly twenty minutes of automated work plus an hour of review, and recounts spending three hundred dollars on AI assistance to set up a Kubernetes cluster — arguing it was cheaper than pulling multiple colleagues away from their work. Both hosts reflect on the shift from writing code in editors to orchestrating AI agents through CLIs, referencing Nicholas Zakas' concept of developers becoming orchestrators and acknowledging the "AI fatigue" that mirrors the JavaScript fatigue of earlier years.
Transcript
[00:00:36.49] - Nick Taylor Hey everybody, welcome back to nickyt.live. I'm your host, Nick Taylor. Happy New Year to pretty much everybody I haven't seen in a while. And Happy New Year to you too, Anthony. How you doing, man?
[00:00:52.12] - Anthony Campolo Doing good, how are you? How was your new year?
[00:00:54.50] - Nick Taylor It was pretty good. Took it pretty easy. Did some snowshoeing at my parents' cottage and then went to a house party for New Year's. You know, nothing super crazy, but, you know, dancing until like 2 in the morning or something. But it was a lot of fun. It was at a friend's house, really close, so we could literally just walk home after. So it was pretty nice.
[00:01:17.10] - Anthony Campolo Yeah, right on, right on. Yeah, I hung out for most of the holidays with my wife's brother's wife's family.
[00:01:27.44] - Nick Taylor Okay, cool. Nice. Yeah, that's cool, that's cool. Now, for folks, like we've been on a bunch of times together before, but for folks who might not know who you are, who is Anthony? The man, the myth, the legend.
[00:01:44.10] - Anthony Campolo Yeah. I am a developer advocate. I first got my start in the GraphQL JavaScript world with this framework called RedwoodJS. I worked for a GraphQL company, then a blockchain company, then a deployment company, then I spent some time building my own app called AutoShow, which we'll talk about a little bit today. And I have a new role now. This is actually, I think, the first stream I've done since it started.
[00:02:17.59] - Nick Taylor Okay, cool.
[00:02:18.45] - Anthony Campolo So it's a little confusing to explain, but I'm hired by a company called Atmosphere and they are contracting me out to another company whose name I technically cannot say, but it's a very large payroll company that pretty much everyone knows, and they are looking to basically uplevel their whole dev force with, you know, AI coding tools. So what we do is we have a two-week kind of crash course that we go through with a couple teams. So there's a whole bunch of us, we're called collaborative engineers. We each have a group of teams that we work with for every two weeks, and we explain to them basically how to use things like Copilot, how to prompt, how to manage your context. It's really heavily focused on things like managing technical debt, writing docs, filling in missing tests. So it's more like code maintenance than a lot of feature development, which kind of makes sense if you think about the company, and they're looking to get a standardized way that everyone can use AI and not just totally go nuts on their codebase. So yeah, it's been really fun.
[00:03:35.08] - Anthony Campolo So I've been doing that now for close to three months. I think I started at the beginning of November. And yeah, this is slightly different from the roles I was doing, in that it's less content-focused because it's all just like, you deliver the material for people on the call. They technically get recorded, but I'm not sure how many people actually watch the recordings. So yeah, it feels most like having an education job, which I used to have. You know, I used to be a teacher. And so with this, I'm not creating a lot of content. I'm not doing a lot of outward-facing stuff. I'm basically just doing workshops and lectures and guided exercises and answering questions and doing office hours and all that kind of stuff. So yeah, I've been really enjoying it.
[00:04:22.28] - Nick Taylor That's cool. It's cool. Also. Hey, Taran, in the, in the chat. [unclear]. Yeah, no, that's cool. That's cool. Yeah, I knew you switched it up. So yeah, I didn't really know the full breadth of what you were doing. But yeah, so yeah, so you're mentioning AutoShow and that's. I'll drop a couple links to that, but I guess we've done a few streams with this. I guess give the TL;DR of what AutoShow is and like. Or like, why did you even create it initially?
[00:04:55.41] - Anthony Campolo Yeah, so AutoShow V1 was based around wanting to create some additional assets for my podcast. So something that most podcasts have is chapter titles and timestamps so you can skip to different parts. You see this both in YouTube videos and in podcast apps. So I was kind of thinking back, probably early 2024 or so. I was like, AI has gotten really good. It can analyze large amounts of text. I also knew about Whisper, which is this open-source transcription tool. And so I was like, if I ran my episodes through Whisper, because it gives you timestamps for each line, I could give that to ChatGPT or Claude and then have it come up with chapter titles. It could identify large chunks of topics that are discussed and find beginning and ending points. That was the very initial idea. And I built a CLI that would automate this process of running the transcription, appending a prompt to it, giving that to the LLM, and then getting the response back. After I did that though, I realized that if I just wrote a different prompt, I could generate all sorts of stuff.
[00:06:13.32] - Anthony Campolo I could create summaries, blog posts, more artistic content like song lyrics or short stories, marketing content, social posts, email newsletters, all sorts of stuff. So I was like, okay, this is pretty cool. So I then started building an app for it because originally, like I said, it was a CLI, but I wanted it to be something that my parents could use or my wife could use. And now actually my wife is the main user of AutoShow. She uses it for lots of little work tasks. So then that was kind of the path. And then I added in different transcription services and different LLM models so you could get different trade-offs of cost and quality and speed. And so that was kind of AutoShow V1, which now has been out for like six months or so. And the app itself has some issues. You know, parts of it were vibe-coded, so some of it I was kind of like, I really want to just rewrite this and kind of get it right from the start. And so I've been working on AutoShow V2, which is the stuff we're going to see today.
[00:07:21.48] - Anthony Campolo So AutoShow V2 is really exciting because AutoShow V1 would just output text, and it would just take in audio or video. Now it can take in audio, video, or PDF or, you know, ePubs or PowerPoints, taking in other content as well. So not just audio and video. And it doesn't just generate text, it also now generates images, music, video, and text to speech. So whatever summary or show notes were written can get turned into a spoken-word thing. So okay, now it's like five different modalities. And last time we did a stream, we were just starting to get into some of that. We were looking at the video and music stuff in the CLI, and we ended up having to drop early for some reason.
[00:08:15.23] - Nick Taylor Yeah, I can't remember. I think I had a hard stop for a meeting that came up.
[00:08:20.11] - Anthony Campolo Yeah, yeah, yeah, Meeting got pulled up randomly.
[00:08:24.40] - Nick Taylor Yeah, I think it was a sales call or something. But yeah, I guess we're definitely going to talk about creating music today. You know, I don't think it's a hot take necessarily, but people either vehemently hate AI-generated music or I guess they're like, it's okay. And I'm sure we'll get into this, but how are you leveraging it? Because I use AI every day, you know, for writing and for coding obviously, and I always treat these like tools. So it's still me doing the blog post, but it's doing tweaks and keeping my style and tone from previous stuff. So AI is helping me, but it's not AI doing it for me. You know what I mean? It was still my idea. And I'm curious how you're treating music. Are you doing a mix of both? Are you just like, hey, let's just yolo-create some weird symphony? And also, for context for people, you were a music teacher before, so you have a music background.
[00:09:39.49] - Nick Taylor I can't remember, were you a cellist or upright bass?
[00:09:43.39] - Anthony Campolo String bass?
[00:09:44.17] - Nick Taylor Upright bass, yeah. Sorry. Yeah, so, yeah, so. So, like, there is some context there too. So it's not like you're just some. Some random guy who doesn't know anything about music, creating AI music. So, yeah, I guess I'm just curious about your thoughts about all that.
[00:10:00.32] - Anthony Campolo Yeah, for sure. So I first got interested in the idea of AI music generation a long, long time ago. I remember, oh man, it was probably 2018 or so. There were some really early AI music experiments coming out. And this is long before ChatGPT or anything like that was around. And I remember I was talking to one of my old music professors about it. We were good friends and kind of hung out after I graduated. And, you know, he was very much of the mind that, you know, like, hey, AI can never write music and blah, blah, blah. And there was a song that was generated that was like a Beatles song. They basically took a bunch of Beatles songs in as the training data and then wrote a new Beatles song, and we were playing it for him. And for me, the fact that it was able to create a song at all that sounded like a song was kind of mind-blowing at the time because, you know, back then, yeah, it couldn't do crap.
[00:11:03.48] - Anthony Campolo I remember playing it for Stuart. I'm like, check it out, it sounds like a Beatles song, right? His response was like, yeah, a bad Beatles song. So that was then. Now things have kind of changed. So recently I've been really interested in Suno. Unfortunately, Suno does not have an API, so we can't really integrate that in the way I would like to right now. But Suno definitely has the best AI music model. And I've watched it progress now over the course of six different model iterations. When I first used it, it was model V3.3. Since then there's been 3.5, 4, 4.5, and now 5. And I feel like they reached an inflection point at 4.5 with the instrumentals and music quality, actually, because I just noticed it hit a point where there was no way, even for me, to tell the difference between those instrumentals on an AI song versus a real song. Like, it sounds like a saxophone, sounds like a cello. There's literally no way you'd be able to tell the difference. The one thing where you could still kind of tell was the vocals would have certain tells.
[00:12:21.59] - Anthony Campolo Especially if you listened to, if you generated, you know, hundreds of Suno songs, you'd be able to tell a Suno song at a certain point. That is no longer the case with V5. I think the vocals especially now, I would challenge any musician to do a Pepsi challenge with a set of AI songs versus real songs and try to tell the difference. I think it would be extremely challenging unless you're someone who has spent a lot of time with those models specifically and you kind of know what the music sounds like that they generate. But it's really incredible. Whether you're actually going to want to listen to them is going to kind of depend on whether you find the right styles. Because if you just put in a song and say, write a pop song, then it may be kind of generic. But if you're like, oh, I want a hip-hop song with jazz instrumentation and synthesizers or something like that, then it can do that really well. So for me, I've been trying to figure out what are the styles that I think actually output interesting stuff with Suno.
[00:13:26.22] - Anthony Campolo And then the point with kind of AutoShow, where that comes in, is how you can generate the lyrics. So I think generating just lyrics straight up with AI is kind of like writing short stories or poems or whatever. With AI, it's pretty good, but at the same time it just feels like there's something kind of off about it. But what I found was interesting is that I could create interesting songs if I did the whole AutoShow approach, where I would feed it an hour-long podcast about a certain topic and then that transcript could be used to write the lyrics with an LLM step in the middle. So basically I would have AutoShow take some content and then create the lyrics. And then those lyrics I would just put into Suno, just copy-paste it into their UI. Okay. Now though, I've finally been able to integrate this into the CLI and the app because ElevenLabs, you know, ElevenLabs was first known for text-to-speech stuff. They now have a transcription model and they have a music model. So they are kind of the first third-party API that I feel like is pretty close to Suno.
[00:14:47.17] - Anthony Campolo I also just integrated Minimax. You know anything about Minimax?
[00:14:52.23] - Nick Taylor I heard about it. It's. I can't. I follow Theo there and I think he mentioned it, but I haven't used it. It's like, did they add image? No, he's talking about Kimmy too with. They have good image generation now too. But he was mentioning Mini Max too, but yeah, yeah. So what's the deal with Minimax?
[00:15:14.38] - Anthony Campolo They're an interesting company. I think especially if they have Kimi, they probably host open-source models as well. So they have kind of their own models and they have open-source models. They have a good range of all the different modalities. So you can generate image, music, video, all that stuff with them. And so I've been getting more interested in the services that kind of offer that whole gamut. Gemini and OpenAI are pretty much the only other ones that offer that spread, although OpenAI doesn't have a way to generate music yet. Minimax Music is another third-party music API model that you could use. Now I have ElevenLabs and Minimax's music stuff built in, and it's integrated with the whole workflow. So you can do it with the CLI or you can do it with the UI, and we'll kind of go over that in a bit. And then a couple other interesting things that I am working on: something that I really did not like was Suno's lyric video generation feature. So they'll create a song. You can download that as an MP3 or a WAV, or you could also download it as a lyric video.
[00:16:36.32] - Anthony Campolo And that's where, you know, the lyrics are displayed and they kind of scroll as it goes. You can, like, follow along with the song. But I just didn't like the design. I didn't like how the lines would be split up and stuff like that. So I ended up building my own kind of custom lyric video generator thing that basically takes in an image, lyrics, and the song, and then it puts the image as the background and displays the lyrics and kind of scrolls them through. So we'll look at that. And then the last cool thing that I'm working on now is, I had this idea where you can put in podcast episodes or YouTube videos and generate a song off of it. But I'm like, you could actually really take this to the next level if you have a longer piece of content. So you could do this with a podcast series. If there's like 10 episodes, you create a song for each one. But I was also thinking you could just take an ePub, any book, and then chop that up into eight equal pieces and then feed each of those to the lyric generator and the music generator.
[00:17:40.33] - Anthony Campolo And then you could create whole concept albums based off of books. And, like, you know, you get your instant Hamilton kind of thing. So now there's a kind of future-of-musicals angle there. Yeah. So there's a couple manual steps to doing that, but I have kind of the whole pipeline basically there. You can, you know, take the ePub, turn that into text files, and then those text files you can use to generate the music. So yeah, a lot of stuff going on, but mostly I'm just finding all this stuff really fun and interesting, and it's stuff that I find, at this point, people still don't know about as much. Everyone knows about ChatGPT, or at least most people do, even if they don't use it. So that part of the app I find is less impressive to people. But if you can just be like, it writes a whole song for you and creates a song, people are like, whoa, that's super cool. You know, so that still kind of has that wow factor, because a lot of people aren't super hip to this AI music generation stuff.
[00:18:40.58] - Nick Taylor Yeah. Also, just want to say hey to Ari in the chat there on YouTube. He's wondering, well, he's giving you a shout out, but also wondering: will you be playing the cello or your big bass?
[00:18:58.29] - Anthony Campolo Not today, no.
[00:19:00.25] - Nick Taylor Yeah, so I guess, objectively, this is more like experimentation, right? Do you see people leveraging this to try and make hit songs, or is this really just trying to be creative in a different way? Do you think AI can enter mainstream?
[00:19:23.43] - Anthony Campolo Like, yeah, supposedly there's already AI music artists that are on Spotify and are getting millions of plays and stuff. I'm sure there's also lots of playlists that have AI-generated songs that people don't even really realize. They're just putting on kind of background stuff. I'm not really aiming it at people who are looking to be serious musical artists. They would really want to use the tools directly, they would want to use Suno directly. They would want to figure out all the different ways they can tweak it, and Suno now has a whole studio feature where you can create songs with stems and break up all the instrumentation and stuff like that. I'm aiming this more at people who aren't musicians and who want to be able to create stuff for fun, or people who are content creators and want to do it for their own stuff. Like I know, I think Latent Space did this for a while where they would create a song for each episode and then that would be kind of like the lead-in song for the podcast.
[00:20:33.25] - Anthony Campolo So stuff like that is kind of fun. I know some other podcasts that do that. So being able to kind of create more disposable but still interesting or useful in whatever sense that the user finds it to be useful for. So this is going to be one more feature in AutoShow. It'll cost a certain amount of credits to generate a certain length of song and it'll just be another thing you can create based on your content.
[00:21:00.13] - Nick Taylor Cool. Yeah. Henri's saying Zanya Monet is an AI artist with songs on charts right now, which I find wild. But I guess if you were a musician, say you were a singer, I remember reading about the Foo Fighters when Dave Grohl started. He literally recorded all the instruments and then he did the vocals, and then he pieced it all together. Yeah, but I guess nowadays if you use something like Suno, say you are the vocalist or the lead singer, well, the lead singer of a non-existent band. But say you're like, you know, I just need a band. You could kind of get AI session players to do your backing parts.
[00:21:50.27] - Anthony Campolo Yeah, so this is actually, this is part of the Suno app, which I have not really done a lot with. So there is ways to kind of meld the human music making and the AI music making to kind of bring them together and you can like flesh out song ideas. You could add extra instruments and stuff like that. That's super cool. I'm more interested in generating songs from scratch and having it be end to end AI so no music kind of input required because this is kind of, you know, the app right now. It's, it's just focused at, you know, general consumers who want to be able to like input their content and get stuff generated. It's not aimed at musicians specifically, so that would be again, something where I would say they want to use like the actual Suno app to do that. But this would be one way that they could create like ideas for lyrics or try out different styles based on similar lyrics. So it's more so aimed at kind of getting you over that hump of like, how do I just get a bunch of material that I can start working with?
[00:22:55.24] - Anthony Campolo So you could generate a whole bunch of different sets of lyrics based on a similar piece of content. So if you have a podcast, you create a couple different versions of the lyrics with different styles and different models, and you can generate different versions of the songs with different services. So yeah, I think it's mostly just trying to expand the different things that you can do with AI and make it really, really simple because I think at this point for devs like you and I, there are all these different services and we're out using all this different stuff and it's super cool. Most average consumers who are interacting with AI at all are still just kind of using the ChatGPT chat UI and then they'll generate images sometimes, but they're not doing a whole lot of this other stuff, or creating workflows where they're generating lyrics with an LLM and then creating music with a music service. So I'm trying to kind of pull those together and provide good defaults and pre-built prompts and things. So people can start generating stuff and they don't have to be like, okay, how do I do this whole thing?
[00:24:08.13] - Anthony Campolo They can kind of just like click through and then they get something and they may not like it, but then they can kind of tweak it and they can try something else.
[00:24:15.05] - Nick Taylor Okay. And yeah, I guess this could be interesting for like, you know, like having an intro to like a live stream or a podcast like you were saying. I guess like in terms of cost because like I know like when we've done the streams in the past, like Whisper is obviously you can just download it for the music stuff. You definitely need API keys for certain things, I'm assuming.
[00:24:43.44] - Anthony Campolo Yeah, yeah. I tried out some of the open-source music models and it's kind of similar with the LLMs. There are ones that are okay, but you can't actually run them on your machine. The best local models are going to be these huge things that take gigabytes of memory and storage. So they're kind of caught in that middle ground where it's like you can't really run them locally, and then if you do use them, they're not as good as the hosted ones. So really, the best is going to be Suno, and then ElevenLabs is pretty good as well. With Suno, you buy a subscription and then they give you a certain amount of credits, and it's a really, really good deal. You can generate like over a hundred songs for like 10 bucks or something totally ridiculous like that. With the API for ElevenLabs, I think it's 80 cents per minute of what you're generating, so it's not super cheap. You're gonna be paying like a couple dollars per song. So with that, you're gonna kind of want to experiment with shorter songs first and kind of dial those in.
[00:25:57.04] - Anthony Campolo So yeah, I just wish Suno had an API because their stuff is so good. So it's really frustrating. So I'm kind of building all these features, but I don't feel like they're quite as good as they could be because I'm going with one service versus another one. What are you going to do?
[00:26:16.24] - Nick Taylor Well, maybe it's a business decision. You know what I mean? They really want people using the studio to build things, versus maybe they haven't thought of the programmatic use case of people wanting to build on their platform. I don't know.
[00:26:29.47] - Anthony Campolo Yeah, I'm sure they have a reason why they haven't done it. Yeah, it's definitely their service. It does a lot of stuff. They have a ton of features and they're always expanding it out, so I can see why they would want to kind of keep people in their walled garden.
[00:26:51.38] - Nick Taylor Yeah. So I don't know. You want to show a bit of what you've been working on.
[00:26:56.55] - Anthony Campolo Yeah. So let's.
[00:26:59.05] - Nick Taylor Let me see if you want to just. You should be able to share your screen and then I can just bring it on.
[00:27:05.29] - Anthony Campolo Yeah. Let me figure out which stuff I'm even trying to show here. Cool. All right, let me go ahead and
[00:27:21.44] - Nick Taylor copy that. And I'm just gonna need to resize it, probably. We'll see. Hold on a sec here. Okay. Oh, no, that looks so. Yeah, I'm just gonna shrink it a bit. Hold on a sec. Okay, that should be good. And I'll get rid of. I added you twice in this scene by accident. All right, cool. There we go. Let's go ahead and switch it over. Cool. All right, I see. Show notes. Library.
[00:28:11.13] - Anthony Campolo Yeah. So this is AutoShow V2, so it has a slightly different look right now. So the biggest change I've made is that I originally with the last one, wanted to have this whole background processing thing where basically after each step it would kick off whatever the processing you would need in the background, mostly so that you could get the transcription started while you're reading through all the other sections and picking your prompts and stuff like that. I decided to drop that just because it added so much unnecessary complexity. Now there's just a form that you just go through and you pick a bunch of stuff. We'll go through this slowly, but I just want to show the whole thing. Then you just create the thing at the end and it just hits an endpoint, sends your entire request and then processes the whole thing on the back end. It has a job. You can leave the page, it will keep running and then when it finishes there will eventually be an email notification that you'll get. So that was kind of the big unlock and that's allowed me to kind of build all these other features in way quicker and way more easily.
[00:29:23.02] - Anthony Campolo The thing that is the same. So real quick, just for people who haven't seen AutoShow. So this is the one that's live right now. So it starts off the same. You can drop in a local file or you can enter a. A URL. They now accept PDF, DOC, X, PowerPoint, stuff like that. So, okay, I'm going to just do kind of. This is a thing I usually test with. It's like a 10 minute clip of me talking about Lambda School on a podcast. Transcription now is separated between cheap versions that just give you like the straight line of text with whisper and then ones that give you speaker labels so it breaks up the individual speakers. That can be really, really useful if you are keeping the transcription afterwards. And there's a lot more options now as well. This is the one thing that I still have to add in. So there's like 60 prompts in AutoShow V1. I just have these ones right now just as I've been testing it because it's a little simpler. This is just for the text output, which we're not really going to be focusing on too much.
[00:30:42.13] - Anthony Campolo So I'm just going to have it create a one-sentence summary. Now for models: yeah, OpenAI, Claude, Gemini, and I also added Grok. They're okay. They're super-duper fast, though. So if you are generating something where the intelligence of the model is not as important, but you want to churn through a ton of content, then they can be good for that. All that is what was in AutoShow V1. Now this is the new new. So you can generate text to speech. You have three options: OpenAI has a text-to-speech model, Grok also has a text-to-speech model, and then ElevenLabs, which is definitely the best and most high-quality one. And then each of these has a different set of voices you can pick from. And if you want, you can also skip these. So what I'll eventually have is a kind of intermediary step after transcription where you'll decide whether you want text output, text-to-speech output, image output, music output, or video output. You'll kind of just check whichever ones you want instead of having to skip the ones that you don't want.
[00:32:00.08] - Anthony Campolo So I'll be a little bit more explicit in terms of what you're wanting to generate. Then this is the image generation. So right now it's got ChatGPT image model and Gemini image model. You can select different dimensions and then a couple preview prompts. These prompts are not that good right now I'm just focusing on just kind of getting the whole pipeline in and then going to start honing these prompts more for the stuff that these creator are fairly generic. So that's one thing that I still got to work on.
[00:32:35.04] - Nick Taylor Okay.
[00:32:36.01] - Anthony Campolo All right now step seven Music. This is what we've been really focusing on. So I built in Gemini, Lyra or Lyra and then learned after the fact that it doesn't actually work with lyrics, it just generates instrumentals. So I'm probably going to pull that out because it kind of if it doesn't actually create lyrics. So I'm going to swap that with Mini Max and then those will be the two.
[00:32:59.52] - Nick Taylor Okay.
[00:33:00.23] - Anthony Campolo Music.
[00:33:00.59] - Nick Taylor There could be a big demand for elevator music maybe.
[00:33:07.56] - Anthony Campolo And then you have six different options for the genre. So it will create lyrics that are tailored to that style and then create the actual music with that style as well. So you can choose rap, rock, pop, country, folk and jazz. Let's go with rock and then video generation Sora 2 from OpenAI and then VO 3.1 from Gemini are the options. And these are the most expensive ones out of all the different options. Then you can choose your video size and duration. So you have 4, 812 seconds for Sora and 4, 6 and 8 seconds for Vo. So you see here switches based on the model giving you whatever different inputs and options you have here. So these are. And these are the same Thing with the image ones, these prompts are don't really create anything actually interesting. Right now the video one is especially going to be challenging because taking a whole like hour long podcast and turning it's like a four second clip. It's like, what are you really going to do with that? So this is one of the things I kind of built, but I don't really know exactly what I'm going to be able to generate useful with it.
[00:34:30.44] - Anthony Campolo But I just kind of want to get in there because I can start experimenting with stuff. So I think.
[00:34:36.17] - Nick Taylor Yeah, totally. Yeah. Gotcha.
[00:34:39.15] - Anthony Campolo Now you get this nice new progress bar where it kind of goes through each step one by one. And you see here also we have this job URL. So if you went back to the create page, it will actually, it has this saved in your local storage. So it will make sure that you don't generate another one until this one finishes. I may at some point have a queue system. So after you generate this, you could go back and create another one while it's going. But right now it just does one at a time. If we look over here, we can see the logs going. This was the processing options. Tells you.
[00:35:25.07] - Nick Taylor You might zoom in it in like once or twice. Yeah, okay, that's good.
[00:35:30.04] - Anthony Campolo So it tells you transcription stuff. You picked your text to speech, music services, video services, and then it kind of walks you through each step as it's going, tells you the prompts you selected. So here's, let's see, Music generation. So right now it's running that. So the music generation step also has an LLM built into it because it generates the lyrics as an intermediary step and then save your lyrics to your show note so you can see it after the fact. So now it's actually generating the song with 11 labs. There is something called a composition plan with 11 labs that we'll get into more with the, with the CLI, but that allows you to give it more information instead of just like a text file with lyrics, you can tell it which section is a verse, which section is a chorus. And you can have kind of text that will have different like vibes or feelings kind of for different sections. So that is. That is pretty cool feature for 11 labs that definitely allows you to hone in your songs a lot more than you would just giving it the straight lyrics and then having it come up with the song.
[00:37:03.03] - Anthony Campolo Now it's doing the generate video step using Sora. I think there's also an intermediary step or create the prompt that is given to the AI and that's. Yeah, I think that's a video scene description explainer probably.
[00:37:24.53] - Nick Taylor I know you were, you were mentioning about productizing this. Is this live on a site now or are you just local for now still?
[00:37:33.15] - Anthony Campolo Yeah. So this is not live yet. It eventually will be. It will be on auto dot show. That'll be the URL. I got auto dot show.
[00:37:46.44] - Nick Taylor I didn't know there was a dot show domain.
[00:37:50.25] - Anthony Campolo I know, right?
[00:37:52.40] - Nick Taylor I guess at this point there's a domain for everything.
[00:37:56.40] - Anthony Campolo So I got, I got a bug in here for, for Sora. Let me.
[00:38:02.11] - Nick Taylor Oh, it doesn't support that file size.
[00:38:05.47] - Anthony Campolo It is not file size, but it was the dimensions. So some of the different models don't have all the same dimensions. So it has 720 by 1280 and 1280 by 720, and whichever one I picked, let's see, it'll be in here, 1920 by 1080. So I need to update the front-end UI to not expose that because it will break it apparently.
[00:38:44.43] - Nick Taylor Ari said there's a dot Nicky T domain. I'm impressed if that really is. I can't tell if it's joking or not.
[00:38:53.47] - Anthony Campolo There's an online. You could do nickyt.online.
[00:38:57.11] - Nick Taylor Yeah, no, that I already have that goes to my socials and I use that for blue sky but I, I guess like they can make. I don't know how it works with top level domains. There's obviously the country ones and like.comand.co but I wonder how they decide to add new ones like show makes sense for TV I guess.
[00:39:24.22] - Anthony Campolo But yeah, I think it has to do with probably anyone being able to make one, but then you have to get certified or something.
[00:39:33.17] - Nick Taylor Okay.
[00:39:34.38] - Anthony Campolo But I have AJCweb.dev, I have that domain, so I like when you can actually get the whole name if it happens to fit into a TLD. That's really nice. And I remember when I first created AutoShow, the AutoShow domain was not available, and it was one of those things where you could make an offer. So I made an offer, like a couple hundred bucks or something like that, and then never heard back over a year ago. But I checked back just like a week ago and it was available. It's definitely the most expensive domain I've ever bought. It's not that ridiculous, but it is a little ridiculous. But at this point I'm pretty baked into the AutoShow name, so I think it was worth it.
[00:40:28.11] - Nick Taylor Yeah, I never paid a lot for a domain, but I actually sold a domain. I was never into domain squatting or stuff, but Ari will probably like this. I had one in the late 90s or early 2000s, I think. I was like, HTML5 was the new thing, you know? And so I tried to register html5.com, but it was taken. So I was like, I'll register html5.ca. And I was like, I'm just gonna blog about HTML5 stuff. I never did. But then somebody dropped me a message like, we'd like to buy your domain. I ignored it because I just thought it was spam or something, because I'd never sold a domain. I ignored the messages like four times. And then he kept upping the price. For context, I paid 10 bucks for it. And then he's like, I'll give you four grand for it. And so then I was like, okay. And I had to read up on how to do this properly without getting screwed.
[00:41:50.59] - Nick Taylor At the time, I used something called Escrow.com. It seemed to be the popular way to do this. It's not just for domains. I think it's any kind of thing where, you know what I mean, you have to switch through a third party.
[00:42:03.13] - Anthony Campolo Yeah.
[00:42:04.16] - Nick Taylor Yeah. So I sold it and then got my four grand, so pretty good profit. And then I learned out later because I was just curious, like, what's the website go to now? And it went to the Internet Explorer download page. So Microsoft had bought it. Had I known, I could have maybe asked for more. But honestly, it's still pretty good considering I paid 10 bucks for it. Highlight Guy463 on YouTube saying, hey, there is Sona AI or something, right? To make songs. Yeah, we were talking about Sona.com before.
[00:42:48.18] - Anthony Campolo Yes. Soon.
[00:42:48.59] - Nick Taylor Oh, yeah, exactly, Ari. Yeah, no way. Yeah, it just goes to nowhere now. But I remember it went to the IE9 download page for the longest time. So anyways, that's how I bought my first Mac. I bought a 2010 MacBook Pro. So it was around 2010 then. I was like, I'm gonna buy a MacBook. I always had, I don't know about you, but up until 2010 I always bought used computers, aside from the first computer I ever bought, a 486, which cost me like 3,600 Canadian at the time, which was bonkers. But after that, while I was in school, I always bought used laptops or desktop PCs off the classifieds. And then when I finally got that, I was like, okay, I'm gonna build iOS apps. And then, you know, I bought myself that, I bought my wife a mini. Was it not the iPlayer? But it was one of the... no, no, it was still Apple, but it was like a...
[00:43:58.23] - Anthony Campolo Right.
[00:43:58.43] - Nick Taylor It's kind of like what the itunes or what the I. What the heck was it called before it was like before it looked. Before they had the phone or just. I think the phone had already come out. Yeah. Because the phone was 2008. But they basically had a music player
[00:44:13.16] - Anthony Campolo that looked like the ipod.
[00:44:15.35] - Nick Taylor Yeah, sorry, I don't know why I can't.
[00:44:18.52] - Anthony Campolo The most important device of our entire lifetime. The ipod.
[00:44:23.45] - Nick Taylor Yes. So I got her the iPod, the newer one. It's not the one with the spinning thing. It was the one that looked like a mini phone. And I got the iPod Nano, which I could use like a watch and stuff.
[00:44:37.21] - Anthony Campolo Dude, I love the ipod. I was all about the ipod as a kid.
[00:44:41.59] - Nick Taylor Yeah. So that's how I got into Mac. And then I had that Mac forever until, I think, 2018. My wife used it for the longest time because it got too slow for web development at one point.
[00:44:58.05] - Anthony Campolo You buy a Mac Mini?
[00:45:00.29] - Nick Taylor No, I didn't buy anything else. I eventually got a. I.
[00:45:05.36] - Anthony Campolo No, I'm talking about right now because of. Claude bought, you know.
[00:45:08.19] - Nick Taylor Oh, no, no, no. Well one. You don't need that. But I have a mini PC right now because of work for like testing out plumerium. So I just added Claude Bot on there.
[00:45:18.08] - Anthony Campolo So yeah, I got a mini Linux box for the first time. I haven't set it up yet, but I might throw Claude on that as well.
[00:45:28.13] - Nick Taylor Yeah, no, that's what I have. It's like my co worker recommended. It's called Minis Forum, the. The brand and I can. I can share the. The model I got, but it's got like 32 gigs of RAM, half a terabyte of drive and. Yeah, Molt bot now. Yeah, exactly. Ari. Yeah, I think Claude sent him a cease and desist or something.
[00:45:54.34] - Anthony Campolo Yeah, stupid.
[00:45:56.38] - Nick Taylor But yeah, it's kind of cool. I was like, I set up Claude Bot and I was like, for context, for people don't know. I work at a place called Plumerium, so we're like good at securing internal apps. So I was like, oh, I'm going to secure cloudbot because there's the bot you create. Like I have one in Telegram, but there's like the web, the website for it, like kind of like the admin panel. And I went to go secure that with Pomerium and then I ran into some issues with WebSocket, so I opened up an issue. So I have a PR I'm working on for it.
[00:46:28.35] - Anthony Campolo What are you using it? What are you using claudebot for? Like your use case?
[00:46:34.11] - Nick Taylor I literally just started using it Friday. I was like. Just because I kept reading about it, I was like, I'll try it out. And then so the first thing I did was like after Peter the main, the creator and maintainer, he's like, yeah, I'd love a pull request. I basically just started working on it and I got Claude Bot. I just did some planning with claudebot and it basically helped me draft a pr and I still have to review it and stuff, but. But it's, it's a bit of a side note but like that's kind of the way things are going, you know, Like I, I don't know how much. Well, you've been working with AI, obviously, but
[00:47:20.23] - Anthony Campolo do I run RALPH with the Claude bot?
[00:47:25.02] - Nick Taylor Yeah, you could maybe.
[00:47:27.35] - Anthony Campolo I'm just ask like the two big hyped AI things over the last month has been RALPH and Claude Bottom. That's all people talk about. They're. They're.
[00:47:35.11] - Nick Taylor Yeah, I haven't ran though. I haven't ran the RALPH loop myself yet, but it makes sense. But it can be like super expensive because it's not necessarily efficient. It's just kind of like autonomous. It's probably more steps than you need to.
[00:47:51.44] - Anthony Campolo Right?
[00:47:52.06] - Nick Taylor Yeah, but, but still interesting. I mean, I know we're getting off topic here a bit.
[00:47:58.32] - Anthony Campolo Yeah, I gotta do something with these 500 worth of tokens I've got between all my different subscriptions at this point. I'm chat GPT Pro and Claude Max and Open Code Send or Open Code Black.
[00:48:12.07] - Nick Taylor Oh, okay, okay. Yeah, I've used OpenCode as well. I don't have their Black. I had connected the Claude subscription and then obviously that got axed from Claude. So I'll probably connect GitHub Copilot, or they recently added GitHub Copilot.
[00:48:30.05] - Anthony Campolo Yeah, you can add your copilot keys to it.
[00:48:32.10] - Nick Taylor Yeah, yeah, yeah. So they do similar, like Goose does this too, from the people at Block. You can have multiple subscriptions and stuff. But I think now is the time to experiment because there's gonna be a rude awakening when the real prices show up. You know what I mean?
[00:48:53.22] - Anthony Campolo Yeah, we'll see.
[00:48:55.08] - Nick Taylor Yeah.
[00:48:56.01] - Anthony Campolo Okay, so I have audio. I have audio enabled for this so people can hear here.
[00:49:01.59] - Nick Taylor Okay.
[00:49:03.16] - Anthony Campolo So hopefully this won't be too loud.
[00:49:08.09] - Nick Taylor Okay, okay, I can hear it. So I'm assuming the crowd can. If the crowd can, let me know if you heard that.
[00:49:20.38] - Anthony Campolo Yeah, I would assume so. You could hear it?
[00:49:23.55] - Nick Taylor I'm pretty sure. Yeah.
[00:49:25.09] - Anthony Campolo Yeah. So the text-to-speech part just takes whatever your text output for steps three and four was, and it will then create audio for you. So that's cool. Nice for accessibility and stuff like that. Then you've got this image here. As I said, the prompts for the images are not really that amazing. So this image is kind of whatever, but it gives you the idea that you can create cover images and stuff like that. These will be refined and have some more interesting stuff to pick from. We probably won't listen to this whole three-minute song, but let's see how this sounds.
[00:50:09.44] - Nick Taylor This is a song about GraphQL.
[00:50:13.15] - Anthony Campolo About my time in Lambda school and Redwood and GraphQL and stuff.
[00:50:17.58] - Nick Taylor Okay, okay. Nobody can hear it. Let me see. I wonder why they can't hear it. It should... audio settings. Let me check here.
[00:50:34.56] - Anthony Campolo I'll keep it going while you're messing with.
[00:50:39.11] - Nick Taylor Sounds like some 41.
[00:50:41.25] - Anthony Campolo Yeah, right.
[00:50:43.54] - Nick Taylor Okay, I'm just checking in. Local file control audio via obs. There we go. Okay, play it again. I can't hear it now, but I think the crowd will be able to. You hear it now, Ari? Let me see. Oh, yeah, I gotta do. Oh, guest sharings. Oh, I see it playing here, but let me go to advanced settings. It's the first time. Okay, it's working now. Okay, so you can hear it? Yeah, I'll put the monitor on so I can hear. I just gotta. Just for me. So I can hear it. Advanced settings, guest sharing screen monitor and output. There we go. Okay, I hear it now too.
[00:51:45.10] - Anthony Campolo All right.
[00:51:46.37] - Nick Taylor Express things in a Postgres core, data rivers twisted and wide. It obviously sounds funny because it's talking about tech, but.
[00:52:04.35] - Anthony Campolo It's.
[00:52:05.09] - Nick Taylor Yeah, it sounds like 22,000 to 2010. Like alternative rock. Like a bit of POD in there, maybe.
[00:52:16.32] - Anthony Campolo Yeah. And I would say this is more comparable to, like, Suno V4, where it's pretty decent, but at the same time there's kind of a genericness to the songwriting and also the vocals. They sound human, but they're also not that expressive. So I'll be interested to see, as ElevenLabs continues to improve their models. I have to imagine they'll eventually hit a similar inflection point to where Suno did, and they may just be a couple months behind or something like that. But yeah, that kind of gives you an idea of what it can do.
[00:52:59.52] - Nick Taylor Yeah. So I can see this being fun, just from a creative standpoint, you know, just having fun.
[00:53:11.19] - Anthony Campolo Yeah, totally. Yeah, it's really cool. And especially with the stuff where you can throw in, like, a book and kind of turn that into songs, I find that that's good. I've been doing that for books that I've been reading, because it kind of helps with retention of information. You hear a song about something from the book and you're like, oh yeah, that did happen. So yeah, I think there's a lot of interesting things that you could do with it.
[00:53:39.14] - Nick Taylor Yeah.
[00:53:39.53] - Anthony Campolo Do you have any questions based on that?
[00:53:43.00] - Nick Taylor Well, I guess so. The one thing is, I mean, I know you did a lot. Like, you obviously used AI to help you generate a lot of the code. Have you been revamping that now? You were talking about this at the beginning. Or are you just kind of in YOLO mode still and it's like litmus test is it's just working so good enough to ship?
[00:54:03.49] - Anthony Campolo Yeah. So I think at this point I've kind of gone through the whole gamut of vibe-coding an app, seeing the drawbacks to that. And now as I'm working on the V2 of AutoShow, I'm trying to mitigate some of those issues. So I think the most important thing, and Dax has a really good perspective on this, is that when AI writes your code, it's basically brute-forcing the feature, usually to the extent that it's going to get it to work, but it's not going to do it in a way where the code is clean, follows the conventions of the rest of your project, or is maintainable. I find that there's two ways you can approach this. You can have a clear idea of how you want your code to be structured ahead of time. This is especially easier for senior devs who would have coded this stuff out all themselves anyway. They can come up with a whole architecture diagram and data flow and types and schemas and all that stuff. If you're just vibe-coding it though, you can basically set checkpoints along the way so that once you generate a bunch of features, you then go back and use AI to clean the code up.
[00:55:37.25] - Anthony Campolo I have a couple of different prompts now where I'll have it look for unused variables or unused CSS classes or stuff like that. It generates a whole bunch of code and then we'll rewrite code and then leave old dead code around. So things like cleaning that up or like checking for conventions around how the types are written or things like that. And this is a lot of stuff actually that I feel like I really got a good handle on from the stuff I'm teaching with the workshops that I've been doing, because those are all kind of, like I said, based around code maintenance and looking for overly complex code, heavily repetitive code, stuff like that. So I would say for people who are Vibe coding apps, you want to vibe refactor your Vibe coded apps in some sort of cadence and then you know, you have. Then on top of that, you have your normal software practices if you need tests and you need good observability and logging and metrics and all that stuff. Taking my time with this now because I'm not trying to launch this to make a whole bunch of money or anything.
[00:56:54.56] - Anthony Campolo So I've got a much better test suite hooked up for V2 now. I feel that the code itself is a lot cleaner, a lot more maintainable, and eventually I'll add in the user login, registration, and credits and stuff like that. But people who do want to use this, this entire thing right now is in the open-source repo.
[00:57:20.54] - Nick Taylor Yeah.
[00:57:22.11] - Anthony Campolo So AutoShow. This is pretty much up to date with the current Bun app that you're seeing right now. So everything you're seeing here, everyone can use. And then eventually this will actually be hosted on AutoShow. So yeah, probably where I'm at right now in terms of the app. And then the original AutoShow, which does the text generation based on the prompts, that's still up. People can use that if they want.
[00:57:54.22] - Nick Taylor Yeah. And it's like sometimes it's fine to just do something, vibe it, just to see. Don't care about quality so much, you just really want to see what's possible. And then you can totally scrap it, because it's so quick to do these things now. Then you can be more serious and intentional about reviewing and stuff. I'm always amazed at how fast I can move now. I mean, I hate writing code by hand now to some degree. It doesn't mean I don't look at it, but I know how to write React components, I really just don't want to do it anymore. It's more like I want my ideas and stuff. Or for example, at work we had an app that was on Create React App. It's been deprecated for like two years, and we had all these dependency updates we had to do. I started fixing a couple dependency updates and realized it's just going to be a nightmare. So I literally spent, I think, a plan with Claude Code.
[00:59:16.11] - Nick Taylor I just said, like, you know, I want to migrate to Vite and get rid of Create React App. And I gave a little more detail, but basically I said, help me create a plan. It executed it, like 20 minutes later. I reviewed it, obviously, and then I might have spent another hour fixing a couple things it missed. Like there was a duplicate React version getting loaded, so I had to add a dedupe to the Vite config and stuff. But basically I saved a crap ton of time. I know I'm saying the obvious because people do this all the time now, but I could obviously port it from Create React App to Vite myself, and I feel like that probably would have been at least a few days. And I don't know about you, but I'm still new to Kubernetes and I.
[01:00:28.19] - Nick Taylor I gave a talk about Kubernetes with zero trust because of Pomerium, and I had to get a cluster up and running on my mini PC. And at first I was like, because we can use AI at work, they encourage us to use it, and I think I blew 300 bucks US in a day just getting my cluster set up, which sounds wild. But at the same time I didn't have to interrupt anybody at work. So anybody salaried, you know, their time wasn't affected or pulled away. So that 300 bucks was probably way cheaper than me pulling in two, maybe three people that I had questions for. You know what I mean? So it's like, I don't know, interesting times.
[01:01:19.19] - Anthony Campolo So yeah, I mean I've been vibe coding stuff for at this point, like going on almost two years. So it's been fun for me kind of seeing the slow march of first devs weren't that into it, then you had some who were kind of more into it. And then now I feel like, I mean, I guess it's hard to say because Twitter is always going to be kind of its own bubble. I'll be curious to what extent just average everyday enterprise developers who aren't online how much they're using AI versus not AI because to me it feels like at this point everyone's coding with AI. Obviously, you know, it's not 100.
[01:02:04.29] - Nick Taylor Yeah.
[01:02:04.53] - Anthony Campolo But yeah, I'd be curious. You know, I'm sure there's different metrics and studies that are being done on this, but I think swyx just tweeted in the last day or two, basically saying how he went from like 20% vibe code and 80% handwriting code to flipping that: 80% vibe code, 20% handwriting code. And he said that just happened in the last couple weeks for him. And it feels like a lot of people with Opus 4.5, and then trying out things like Claude Code or OpenCode, are finding that the models have gotten a lot better and the harnesses around them have gotten a lot better. I kind of shied away from Claude Code for a while. I made that classic mistake where I tried it out when it was brand new and I'm like, oh, this is kind of slow and crappy, and then didn't try for a while. And obviously it's gotten way, way better. So now I'm on the OpenCode train for sure because it's basically like Claude Code but multimodal. So yeah, because, you know, Copilot in VS Code is still kind of slow and crappy.
[01:03:14.47] - Anthony Campolo That's kind of what I teach for the workshops that I do. But at this point, OpenCode is so much faster and so much nicer. I haven't tried Cursor in over a year. I know Cursor now has a super-fast model, so they can probably do better in terms of that. But yeah, I mean it's definitely the time to be coding with AI.
[01:03:41.59] - Nick Taylor Yeah, I've found I'm less and less in the IDE. And also the GitHub Copilot CLI, it was in preview before. I think it's out of preview now. I got to take it for a spin again, but I got a pre-trial of it when it was through the GitHub Stars. But yeah, I forget what the announcement said because a bunch of the Stars have been posting stuff. I was just sick as a dog. I'm like, there's no way I can try to build something this weekend. My guts are dying. But yeah, it's supposed to be pretty solid. I know Cassidy Williams and Jason Lengstorf are doing a stream, I think maybe in an hour. I'm not sure. I think it's today. But I don't know, it seems kind of weird, but I'm a big fan of VS Code and I'm less and less in the editor. It was kind of neat initially when you saw the code completion. You know, you type something and it goes. And sometimes it was annoying, obviously, but I just find I'm more and more in CLIs for this stuff. And I got to give the Copilot CLI another go now that it's officially out.
[01:05:07.05] - Nick Taylor But I don't know. And it ties into. It's really more like I don't really need to see the code changing. It looked cool when it did that initially, but now it's like I would rather see a diff or like my litmus test is like, okay, here's my plan. Do the thing. Is it doing what I actually set it to do? And then if it's actually working, then at that point that's when I start looking at the code to review it, you know, and then pass. Yeah, yeah, you know, and it's also good at writing tests to you and stuff. Like so I don't know. It's just. It's been pretty wild how things are changing. But yeah, and it's like, I don't know. Nicholas Zakas, who's like the creator at ESLint, he. He had a great blog post he put out. I think it's January 20th. He's talking about like, and it makes sense here. We went from devs to like conductor to now we're gonna be kind of orchestrators of these things, you know, and, and it makes sense, right? Like, because like, like I remember when co pilot came out or chatgpt, you're just talking in one instance, but now it's like I'll have multiple instance of Claude going, you know, do this and then like while it's still doing that thing and like, you know, it spins off and work trees.
[01:06:33.44] - Nick Taylor You know, it'll do sub-agents, and then I have another one open saying, like, do this other thing. And it's like, I don't know, it's just wild times. I feel like even in three months it's gonna be different again. And I guess the only downside is, it's not so much the FOMO, but it's kind of exhausting sometimes to just be
[01:07:00.53] - Anthony Campolo like JS fatigue, you know, but now.
[01:07:05.32] - Nick Taylor Exactly. It's like, oh, another model came out, somebody dropped 20 more skills or like, and it's just like, okay, okay, okay. It's like, like obviously you can't consume it all, but like it does, it does feel like the jazz fatigue a bit, you know, like, it's a lot of exciting stuff, but it's just like, just can't. Just can't do it all right now, you know?
[01:07:29.37] - Anthony Campolo Yeah, no, for sure. That's one of the reasons why I've been glad to have AutoShow because as new stuff comes out, I'll build it into the app and try it out, and try out new services and new models. So I feel like it's been a good playground for trying out all this stuff. And once you get a handle on the main models, it's the same with frameworks. Once you understand React, Vue, and Svelte, when a new one comes out, it doesn't take you that long to understand it and get spun up with it. Stuff like ClaudeBot and RALPH are a little different, and those are interesting. So there are definitely higher-level AI tools that are still pretty confusing, but in terms of just the models and what they can offer, they're mostly just API input points that return text, honestly.
[01:08:19.51] - Nick Taylor Yeah, yeah, exactly. You know, so.
[01:08:23.03] - Anthony Campolo Okay, so let's, let's get to the, the lyric video stuff.
[01:08:28.10] - Nick Taylor Cool.
[01:08:29.17] - Anthony Campolo So let me switch back to entire screen. Looks like I can share my system audio. That's the thing. Okay, so let's take a look. Okay, I'm not going to go super deep into how this works. I did have it write a tutorial that I might have published at some point. But the kind of main thing that's happening here is that it is designed so that it can figure out where the lyrics in the song are by using Whisper. It's not perfect. There'll be times where if the song starts with the singer making some kind of vocal take, it may confuse it and it will start in the wrong place. So for the most part it kind of works out of the box. I'm eventually going to include the ability to use LRC and VTT files, which are subtitle-type files, to give the exact timestamps. But basically what it's going to do is, let me actually do this, give me the high-level explanation. Okay, so it runs Whisper to transcribe it and figure out the word-level timestamps.
[01:10:09.09] - Anthony Campolo It then compares that against the lyric file that you give it. So you'll have a lyric file like this, and then you'll have a cover image like this, and then an audio file like that. And then it will create a video where the lyrics kind of scroll in time with the song. And then it's all built into the CLI. Right now this is just a free-floating repo. This will eventually get upstreamed into the AutoShow CLI. So let me just run this so people can see what's going to happen. Let's go here. It's looking at your Whisper model, language, resolution, font, all that stuff. Then it finds the lyrics and tokens and compares them, figures out the alignment. So this is where things can get kind of hairy if it doesn't just work out of the box. But basically it just looks at the Whisper transcript and the lyrics transcript and compares them to try and figure out where the timestamps match up with the lyrics that are written. Because the lyrics that come out from transcription will be slightly misspelled or malformed. Also, did you know there is an ASS file format?
[01:11:56.49] - Nick Taylor I did not know there was ass. No.
[01:11:59.28] - Anthony Campolo Yeah, something to do with this process that I do not understand. Okay, here we go. Let's see if this works.
[01:12:18.21] - Nick Taylor This is a song that is used as an example for the repo that needs an example. This example will be Used as the example. And it is very short. This is the whole song. This is the host.
[01:12:30.35] - Anthony Campolo Yeah. So that's just to test it out. So as you see, you know, it has the background as a song title up top, it has one line, one lyric line per line. You always have the next one from front and the previous one behind it. So this to me is just like so much better than what you're getting from Suno. It's much easier to follow along with. It looks esthetically more pleasing. So this is something I just kind of like vibe coded in a day just for fun. And there's a ton of code here. So I'm gonna probably do the whole refactor cleanup on this and then kind of upstream it into the show cli.
[01:13:11.50] - Nick Taylor So, yeah, the song had a good hook to it.
[01:13:17.45] - Anthony Campolo So now in the AutoShow cli it can basically do the same thing that we saw the front end do. So. Okay, so this is. I ran this before the stream, so we can just kind of look at this. So the command was this guy. This guy right here. So you pass in whatever your base file is LLM you want. And then you can do minimax or 11 labs, and then you can pass it different genres. And then you can also add extra text to describe the music style. So you don't just want pop, you want 80s synth influenced pop. And then it will generate the lyrics, feed it to the model, and then output it as a file here. So this is from Minimax. Let's see how this one sounds.
[01:14:34.50] - Nick Taylor Your call is important to us. Please stay on the line. All our representatives are currently visiting. Street lights flicker on our faces Counting wishes, counting graces Another year of broken places like. Could be from like a mediocre Disney movie, But it's like,
[01:15:15.10] - Anthony Campolo yeah, I mean that, that could be a popular song, you know?
[01:15:19.01] - Nick Taylor Yeah, like it's. It's got a. It's got a nice hook to it, you know.
[01:15:23.22] - Anthony Campolo Totally. Yeah. So that was by. It generated the lyrics from the audio file. You can also just create a song from a bass prompt. So if we go to the music command, let's do it with.
[01:15:47.57] - Nick Taylor This one.
[01:15:53.47] - Anthony Campolo So this is going to just take a prompt and the prompt is an upbeat electronic dance track. And it's going to generate that with 11 labs. So this will not use the whole pipeline. It will just kind of use the music service directly. The CLI is kind of set up this way where I first built the main workflow, which is the text command. And then I added on a bunch of more commands like music and image and video. And then in the ui, I pulled them all together to where you could generate those assets based on the initial transcript and then the cli. Right now I have to still build in a lot of that scaffolding to connect the image and the video commands and stuff to the initial text command. But here.
[01:16:46.03] - Nick Taylor Okay, gotcha. That's it, up in the club.
[01:16:58.46] - Anthony Campolo with that John Ham chip.
[01:17:01.05] - Nick Taylor Yeah, that's actually a good beat. Yeah, like this, I can't tell it's AI.
[01:17:15.59] - Anthony Campolo Yeah, especially. Drop.
[01:17:29.06] - Nick Taylor Oh yeah. Like this is clearly based off of existing stuff. And then boom. No, this, this is definitely good electronic.
[01:17:43.45] - Anthony Campolo Yeah, that. That slapped. Actually that was pretty great.
[01:17:46.57] - Nick Taylor That, that, that is pretty solid.
[01:17:50.04] - Anthony Campolo Yeah. So that was ElevenLabs. So yeah, I'm still kind of finding what styles work best with different models and stuff, but that was pretty tight. So yeah, that's the music functionality. Very, very last thing I want to show, and then we can start wrapping it up. Now, the thing I was talking about: creating a full concept album. So I have this EPUB command, and then let me run it on this guy. And you can split it up into however many parts you want. Let's just split five, so I'll turn it into five text files.
[01:18:48.06] - Nick Taylor Okay.
[01:18:52.16] - Anthony Campolo Goes super duper fast. So you see here 5 files created, 22,000 words, 0.1 second.
[01:19:00.06] - Nick Taylor And I'm assuming this works only with ePubs that are DRM free.
[01:19:09.23] - Anthony Campolo I think it'll, I'm pretty sure it'll work on any ePub. Okay. Because it's using an open-source EPUB library and it is basically just extracting it out as HTML and then stripping out the HTML.
[01:19:23.48] - Nick Taylor Okay, I didn't mean in terms of pirating. I was just curious. I'm not sure how, because on my Kobo, when I'm reading stuff, I can put on PDFs and stuff through Instapaper. But I buy books too from Kobo, and I also have existing EPUB books I've bought that don't have DRM. I bought it off, what the heck's that site called... anyways. But all I mean is I thought with DRM you can't even get in. But I guess with the open-source project, maybe it somehow is able to read it still.
[01:20:09.12] - Anthony Campolo Yeah, I'm not sure. There's a site called Anna's Archive, which is like the Pirate Bay for books, essentially, so you can download.
[01:20:19.22] - Nick Taylor Not condoning that. Not condoning.
[01:20:21.32] - Anthony Campolo Yeah, yeah, yeah. And they're always kind of getting shut down also. But they have a whole bunch of different domains out there, so you can still find it. But yeah. So I'm not sure if DRM will influence it at all. I don't think so. But if you actually bought it through the Amazon bookstore, that might be a different case. But we see here now we have these five text files. This is a book that I really, really love. It's 40 short stories, each with a different kind of vision of what the afterlife could be like. Super, super interesting book. But anyway, since I did split five, it just takes the whole book and then breaks it up into five pieces. So then the additional scaffolding would be another command I would need that would basically take each of these individual files and then run them through the pipeline, which I just demonstrated, where it takes a text file, creates the lyrics with an LLM, then feeds those lyrics to ElevenLabs or MiniMax. So eventually this will all kind of be wired together and then included in the AutoShow app.
[01:21:35.55] - Anthony Campolo So people can kind of generate multiple songs that are of a similar concept or style. And then you won't just be generating songs, you'll be generating entire albums, which I think is pretty cool.
[01:21:51.40] - Nick Taylor Yeah, no, that's awesome. No, that's pretty cool. There's like a lot of... you stopped sharing your screen. Let me go back over here. Sorry, I was too into the conversation. Yeah, no, that's pretty cool, all the stuff you put together there. Oh yeah, Ari was saying Anna's Archive was shut down.
[01:22:15.12] - Anthony Campolo Actually the .org domain was shut down. You can still get a .pm or .li and some other ones too.
[01:22:26.07] - Nick Taylor Well, whatever. I don't condone it. Live your life.
[01:22:34.34] - Anthony Campolo Yeah, and there's actually this interesting tangent here, which is that the reason why it got shut down the.org and why they're going after it now specifically, is because they scraped, I think, the entirety of Spotify and then uploaded all. So they have traditionally been books and scientific publications and things like that. But now I think they also. I'm not sure if you can actually download music through Anna's archive or if it's just available in like their torrent, because you can. You can download the sum total of Ana's archive and like mirror it so that's why it's kind of decentralized and it's all over the place. But that's going to make a very, very big difference for the open source AI models if they have that data set to train on now. So I did see some open source model drop that people were like, oh, it's like open source Suno model. I haven't actually tried it yet. I was going to look at it before the stream but didn't get around to it. So hopefully the open source music models will be getting better soon. But right now the proprietary ones have gotten to be pretty dang good.
[01:23:47.28] - Anthony Campolo So that's kind of where I'm at right now.
[01:23:51.14] - Nick Taylor Cool. All right, well it's been fun man. And remember kids, don't pirate. Napster was fun while it lasted.
[01:23:58.58] - Anthony Campolo You wouldn't download a car.
[01:24:01.58] - Nick Taylor Yeah, you sound like the movie guy. In a world where... Anyways, thanks for hanging, man. Always good, man. And yeah, keep me posted with the updates, always great hanging. And everybody else, thanks for hanging today. And Anthony, if you don't mind just staying on for a second.
[01:24:23.18] - Anthony Campolo Thanks everyone. Hey Henri, good to see you man. Thanks for joining later.