skip to content
Video cover art for RAG and Vector Databases with Dev Agrawal
Video

RAG and Vector Databases with Dev Agrawal

Anthony Campolo walks through building RAG search with SQLite vec and Cloudflare Vectorize, while Scott and Dev discuss AI-powered apps and agentic workflows.

Open .md

Episode Description

Anthony Campolo walks through building RAG search with SQLite vec and Cloudflare Vectorize, while Scott and Dev discuss AI-powered apps and agentic workflows.

Episode Summary

In this livestream, Anthony Campolo is joined by Dev Agrawal and Scott to explore how to implement retrieval-augmented generation (RAG) for AI-powered search in applications. Dev explains his current project building a chatbot over a company's machine learning research papers, which prompts a hands-on walkthrough of Anthony's tutorial project using SQLite vec and Cloudflare Vectorize. Anthony demonstrates how embeddings are generated via OpenAI's API, stored in a vector database, and then queried using cosine similarity to rank document relevance. The conversation covers practical challenges like document chunking strategies, sliding window overlap to preserve context across chunk boundaries, and managing metadata so the system can trace answers back to source documents. Anthony then shows the AutoShow staging app's front-end implementation using Cloudflare Vectorize, which includes built-in chunking via its Auto RAG feature. Scott shares his church sermon app that generates devotionals and study guides, explaining how vector search could help users find relevant content across an archive of past sermons. The group also discusses tooling journeys from Bolt.new to Cursor, the promise of Convex as a backend, agentic RAG loops, and newer developments like Jina embeddings v4 and Vercel's sandboxed compute for running AI-generated code.

Chapters

00:00:00 - Introductions and Project Updates

Anthony welcomes Dev and Scott back to the stream and catches up on what everyone has been working on. Scott shares that he has been building his own app since May, crediting agentic coding flows with dramatically accelerating his progress as a non-traditional developer. He describes reaching an MVP stage and working on additional companion apps.

Dev then explains the project driving his interest in today's topic: building a chatbot that draws on a company's machine learning research papers to answer domain-specific questions. He highlights the challenge of connecting information across multiple documents, not just retrieving from a single one. Anthony frames the "chat with your docs" problem as something many developers have tackled since early 2023 and asks Dev about the scale of his dataset.

00:04:30 - Understanding the Data and When RAG Is Needed

Dev describes his current dataset of roughly 20 research papers ranging from a few pages to 30 pages each, with plans to potentially scale to 100 or 150 documents. Anthony raises the practical question of when you actually need vector search versus simply pasting documents into a large context window, noting that 20 papers might fit directly but 150 would not scale.

Dev emphasizes that the real complexity lies in cross-referencing information scattered across many papers to answer a single question, requiring the AI to understand relationships between documents. Scott adds context about how vector databases differ from traditional databases by storing high-dimensional numerical representations that enable similarity-based lookups rather than exact matches.

00:09:12 - SQLite Vec Tutorial Walkthrough

Anthony shares his screen and walks through a tutorial project demonstrating vector search with SQLite vec. He explains the code structure: creating an embedding table in SQLite, inserting documents with their embeddings, and querying using cosine distance to rank results by similarity. He notes that OpenAI's Ada 002 model generates the embeddings by sending text to an API endpoint.

Running the CLI commands live, Anthony demonstrates creating embeddings for three sample content files and then querying "What is FSJam?" The results show the FSJam-related file scoring highest at 0.85 similarity, with unrelated files scoring lower. He briefly mentions PGVector for Postgres users and explains why he chose lighter-weight options to avoid running a full Postgres instance.

00:17:41 - Document Chunking and Overlap Strategies

Dev asks about the chunking process Anthony mentioned earlier, and Anthony explains the practical reality: OpenAI's embeddings endpoint has input length limits, so large documents must be split into smaller pieces before generating embeddings. He recounts manually splitting files that exceeded the limit during his early AutoShow work.

The discussion covers multiple chunking approaches, including parsing markdown by heading structure and using sliding window techniques with overlapping text to prevent breaking context mid-sentence. Anthony explains the tradeoff with overlap — it preserves meaning at boundaries but introduces duplicate text. Dev also raises the concept of hybrid search, combining multiple search strategies and merging their results for better retrieval accuracy.

00:22:36 - AutoShow's Cloudflare Vectorize Implementation

Anthony demonstrates the AutoShow staging app's front-end implementation of vector search using Cloudflare Vectorize. He shows the UI flow: clicking a button to generate embeddings for show notes, then typing a question into an input box to receive an AI-generated answer with source citations pointing back to specific episodes and transcripts.

He walks through the Cloudflare Worker code that handles vector insertion and querying, noting that Vectorize appears to include built-in chunking capabilities. The conversation briefly addresses a viewer question before Anthony highlights Auto RAG, Cloudflare's feature that automates chunking configuration and overlap settings, making RAG implementation significantly more accessible.

00:28:30 - Speaker Diarization and Voice Cloning Tangent

A viewer named Matt asks about building a pipeline that separates speakers from audio using Whisper and prosody-based techniques for voice cloning. Anthony advises that Whisper is not ideal for speaker diarization and recommends either the Whisper Diarization Python package or paid services like AssemblyAI and Deepgram that handle speaker separation natively.

The group discusses prosody — the rhythmic and acoustic patterns of speech — and how it might be used to distinguish speakers. Anthony suggests that the simplest approach would be to obtain clean audio recordings of individual voices for cloning rather than trying to extract them from mixed audio. Dev jokes about cloning his own voice for meetings, and Anthony mentions tools like ElevenLabs that already offer voice cloning capabilities.

00:33:32 - Scott's Church Sermon App and RAG Use Cases

Scott demonstrates his app that processes church sermon recordings into outlines and weekly devotionals, using Supabase for the backend and Netlify for hosting. He explains his vision for using vector search to let users generate custom study guides by searching across an archive of past sermons for content relevant to specific topics or personal questions.

The group discusses concerns about AI hallucinations with sensitive religious content, and Scott describes his mitigation strategy: strong backend prompting combined with a human review step where pastors can edit AI-generated content before publishing. Dev draws a parallel to code review practices, noting that AI-generated output always requires human oversight regardless of the domain.

00:39:12 - Security, Supabase Policies, and Developer Workflows

Scott dives into lessons learned about Supabase security, particularly around row-level security policies and Postgres roles, which he found AI models struggle to implement correctly. He describes discovering a community member's approach to handling roles through wrapped functions, then feeding that repo into his AI tools as reference context — sparking an idea about using vector databases to enforce coding patterns.

Dev asks about citation and source referencing in AI responses, which Scott sees as especially important for scripture references. He describes plans to integrate Bible APIs so users can select their preferred translation, connecting AI-generated study content back to authoritative source texts that readers can verify.

00:43:16 - Model Selection and Speed Optimization

A viewer asks about which LLM Scott is using, and he reveals he is on GPT-4, which he acknowledges is painfully slow — taking over a minute to generate outlines and devotionals. Anthony confirms this, and they discuss the importance of UX during long generation times. Scott describes building a multi-stage progress interface showing transcription, outline generation, and devotional creation steps so users understand the system is working.

The conversation touches on faster alternatives, with a viewer recommending Grok's infrastructure for Llama 3 inference and noting DeepSeek V3 is cheap but slow. Dev suggests GPT-4.1 mini as a faster option. Anthony relays viewer advice about using V0, Gemini Canvas, and Claude Code as complementary tools for different parts of the development workflow.

00:45:05 - From Bolt.new to Cursor: A Non-Developer's Coding Journey

Scott shares his progression through AI coding tools, starting with Bolt.new in May where he burned through 27 million credits in 30 days. He describes a workflow of bouncing between ChatGPT and Bolt — using ChatGPT to generate implementation plans, then having it write prompts specifically tailored for Bolt to execute. The visual interface of Bolt, with its click-to-edit elements, was initially very appealing for a non-developer.

However, repeated issues with Supabase connectivity and the lack of a proper dev environment in Bolt led Scott to accidentally break production twice. This pushed him to adopt Cursor, where over three weeks he transitioned to doing 90% of his work. He reflects on the journey from non-developer to using CLI tools and a proper IDE, noting that the gradual progression through increasingly powerful tools built his confidence and skills naturally.

00:52:01 - Convex as a Backend Alternative

Dev explains Convex as a reactive database platform where queries and mutations run directly inside the database server rather than on a separate application server connecting over the network. He highlights that instead of writing SQL, developers write JavaScript or TypeScript, and any data changes automatically trigger query reruns that push updates to connected clients in real time.

Scott asks practical questions about compute limits and edge deployment. Dev explains that Convex handles connection pooling and memory management transparently, unlike Supabase where developers must manage Postgres connections themselves. He also mentions Convex's open-source self-hosted option and their new Chef product, built on Bolt, which can scaffold a complete backend with authentication and email integration.

01:00:12 - The Middle Ground in AI Coding Tools and Agentic RAG

Dev raises the question of whether there is a missing middle ground between no-code tools like Bolt.new and developer-heavy tools like Cursor or Claude Code. The group discusses how AI models struggle with less common frameworks, with Dev noting that AI frequently writes React code when he wants Solid. Scott suggests better context engineering and documentation feeding as the solution.

The conversation pivots to agentic RAG — the concept of AI performing multiple retrieval queries in a loop before generating an answer. Dev describes his experiment using Open Code with a SQLite database as persistent memory for an AI agent, where the model writes its own SQL queries to store and retrieve context across sessions. Anthony shows Cloudflare's Auto RAG documentation and mentions Jina embeddings v4 as a promising new multimodal embedding model.

01:10:00 - RAG Limitations, Vercel Sandboxes, and Wrap-Up

Dev asks how well current RAG implementations actually perform and where they break down. Anthony shares his intuition that RAG is most critical for proprietary company data that models cannot access through general training, while for broadly known topics the LLM's existing knowledge may suffice. The group briefly discusses the analogy between AI context windows and computer memory versus storage.

Dev mentions Vercel's newly announced sandbox feature for running untrusted or AI-generated code in secure ephemeral environments, which excites the group as a tool for agent-driven code execution. The stream wraps up with the participants sharing their social media handles and encouraging viewers to embrace AI tools, with Dev promoting the company where he builds AI and GraphQL products.

Transcript

00:00:03 - Anthony Campolo

All right, and we're live. Welcome back, everyone, to a special episode with two guests: AJC and the web dev. We got Dev and we got Scott. What's up, guys?

00:00:17 - Dev Agrawal

What up?

00:00:20 - Scott

Yeah. Sorry.

00:00:25 - Dev Agrawal

Yeah.

00:00:25 - Scott

Okay. There's something that completely—I zoned out. What a great intro, Scott.

00:00:34 - Anthony Campolo

You haven't been on the stream in a while. Dev's on, like, every week now. So what have you been up to?

00:00:39 - Scott

Oh, I think lots of things, really. So I was doing some consulting stuff around growth and sales and things like that. Then I decided to build my own app, and I've been really focusing on that since May, basically. So the last couple months, I have gotten so far. I mean, it's crazy how much agentic flows have opened up doors for people, including myself. I've tried building things before and I've gotten so far, and with agentic flow, it's like you are able to do so much more in such a little time. It keeps my attention more, so I can get a lot further.

I've actually got the MVP, basically, for this app. The base functionality works, but there's always tweaking that can be done.

[00:01:56] So anyway, that's basically what I've been doing full time right now: working on this app and a couple other apps that maybe work in conjunction with this one.

00:02:07 - Anthony Campolo

Super exciting. Yeah. And I know part of the functionality you want that app to have will include what we're going to talk about today. So, Dev, you hit me up and you were asking about RAG and AI-enabling search for an app using AI. It's something that I've kind of built for AutoShow. I originally did it for the CLI and then worked out a way to do it on the front end. I also built this thing called Ryan GPT, where I took all of Ryan's transcripts and turned those into a chat interface.

So you already knew I kind of had some experience with it, but why are you interested in this problem?

00:02:51 - Scott

Yeah.

00:02:52 - Dev Agrawal

Right now, one of my main projects is building a chatbot that uses a company's knowledge base and answers questions about machine learning and stuff. This particular company has a bunch of their own research, and they want to use all the surrounding research to answer those questions properly, with updated knowledge, correct knowledge, and good grounding. They want to continually grow the knowledge base over time by adding more papers or documentation, or tweaking how the AI looks at them and what the connections are between them.

It's a very interesting problem. Basically, as I was looking into it, I'm like, wait, this is basically like 50 other products that exist today, including AutoShow and NotebookLM. Everyone's trying to do something like this. So yeah, I should be talking to more people about how this is done.

00:04:02 - Anthony Campolo

Yeah, totally. The "chat with your docs" thing was really big in early 2023 when everyone was freaking out about ChatGPT. That was one of the first things people would build, like chat with this PDF, you know? It's definitely a problem a lot of people have thought about and tried to implement solutions for. How many files, text, or data are you working with? How much information does the company have?

00:04:30 - Dev Agrawal

Right now I have about 20 papers, but what I'm trying to build a POC on top of is just those 20 research papers. There's almost definitely going to be more. It's probably not going to be super search-engine-scale content because it's just all the documents around an idea and a specific company. That's not too much, and we can still rely on the general pre-trained knowledge that models have about machine learning and stuff. So yeah, 20 right now, maybe 100 or 150 if we include all the documentation they might want in the future.

00:05:19 - Anthony Campolo

And how many pages is a paper, roughly?

00:05:23 - Dev Agrawal

That's a good question. I converted all of them to text because AI could not look at PDFs. I just used PDF to text, and some of them had like 4,000 lines. It goes anywhere from 700 lines to 4,000 or 5,000 lines. So maybe five pages to 20 or 30 pages.

00:05:45 - Anthony Campolo

Okay. Yeah. So 150 of those would be quite a lot.

00:05:52 - Dev Agrawal

Go ahead.

00:05:53 - Anthony Campolo

Yeah. That was one of the things you were asking before the stream, at what point do you actually need this.

00:05:58 - Dev Agrawal

Mhm.

00:06:01 - Anthony Campolo

Yeah, because I think, especially with context lengths getting longer, if you have 20 papers you could probably just copy and paste all those and throw them into ChatGPT and it'll figure it out. But at a certain point that doesn't really scale.

00:06:16 - Dev Agrawal

Yeah. And the trickiest part is that a lot of questions require parts from a lot of different papers or documentations to really answer them. It's never like you need this one paper to answer this one question. You need to provide an AI a way to understand not just the papers and what they mean, but how they connect with each other. If you get asked a question about a specific thing, you have to know that there are ten different papers that talk about it somewhere, and you have to know how they relate to each other so you can provide a better response.

00:06:58 - Anthony Campolo

Okay, cool. Yeah. I think the example I have does do that. Is there anything else you want to talk about before we start getting into the code? Have you tried anything yet? Dev, have you played around with any vector databases or systems like it?

00:07:18 - Dev Agrawal

Yeah, I did. I built a little demo with SolidStart and DataStax in December when they were doing this 12 Days of Code. I also used it as a way to experiment with Solid Socket, which is something I was building at the time, a way to build real-time apps really easily. So I was trying to put a RAG search within a real-time app. That was kind of the extent of it. What I got out of that is that vectors should be treated as a way to sort results. If you have a list of things, you can compare all of them with a vector and get the similarity. It's like a similarity sort. It's like text search. That's the way I've been thinking about it recently. That's my experience with it.

00:08:12 - Scott

Yeah. I remember when I was looking into doing vector stuff, the way the vector stores data is a little different than your traditional database. It creates these high-dimensional vectors, an array of numbers representing the data's features or characteristics. It can find similarities that way, because of how it's stored, a lot better than a traditional database could. So yeah.

00:09:12 - Anthony Campolo

Cool. I'm going to go ahead and share my screen. I created a simple little project here that does the same thing with SQLite vec, their vector thing, and Cloudflare's Vectorize. Let me pull those up real quick.

So SQLite-vec: this is their vector search. This is the very first thing that I ever used when I was first trying to do this with AutoShow, like six months ago or something. It works pretty well. It has a Node package you can install to do stuff. The biggest issue I ran into, and it looks like they may have fixed it, was I was trying to do this with Node SQLite and it couldn't do it originally. So that might have changed now. That's sweet. That means the biggest blocker I had is probably no longer here, because what I had to do was use better-sqlite3, which it recommends here if you need to.

[00:10:27] So I'm not sure what the advantage would be of doing that right now, but this is cool, especially for SQLite. You don't want to have to get a whole Postgres database or something like that. PGVector is connected to Postgres. I think Supabase created this, and it's a full vector implementation for Postgres. If you're already running Postgres, this is obviously what you would use. I ended up not going with this because I wanted to avoid having to run a whole Postgres database.

Right now my app is still just using S3 for literally everything. I'm going to eventually switch over to D1, probably with Cloudflare. What's interesting is Cloudflare has their own Vectorize. They have their own way to do this, which isn't connected to their D1 product. It seems to be a separate thing, so I'm not really sure what they are using under the hood.

[00:11:32] It says here that it can reference R2 or KV or D1, so you can use it as a way to vector search across a bunch of your data sources, which is pretty nice. I probably landed on Cloudflare Vectorize. The current implementation on the AutoShow staging environment is using this. I'm not really sure what the cost implementation is or what it costs versus SQLite or PGVector, but I don't think it'll be that much. So far I haven't gotten charged yet, so we'll see about that.

So I have a simple little tutorial I created where we have three content files, very short, and some are in text, some are in markdown. It can work with both of those. Let me step through the files for that SQLite one. I create a simple CLI, like I usually like to do. I've got some text, some types for your document and your query results.

Then here we got SQLite. This did use better-sqlite3. When I coded this I think I did it because I didn't think it was going to work without it. But it does a couple of things. The first thing it does is it creates an embedding table. That is some SQL that creates the embedding and then sets your columns. The actual embedding is a column itself, the blob column. And then you will see it creates an index.

Then there's insert documents. This will use a SQL insert command. It includes the file name, the content, and the embeddings. You'll see stuff like this vec_32 because there are different kinds of what's called cosine similarity, or distance vectors. There's different distance vectors. Cosine similarity is one type. There's a lot of math terms in this, so I'm not 100% certain on some of the terminology.

But then here is the query part. This is what we were just talking about in terms of how it actually does this. It looks for something that has the cosine distance, as close as possible, and then it orders it by how close it is. So it ranks them by the similarity distance. I think right now it's giving you the top five results by default, but you could configure that and get more depending on how large the things are that you're giving it and how you chunked it. There's a whole part of this which I'm not going to touch on, really, which is taking the initial text and chunking it into small pieces before you create the embeddings. That's pretty important, because this isn't just a quick introduction. We're not really going to get into that.

And then this is just a factory function because I've built it to work with three different things.

[00:15:03] This is a util for actually reading the files. This part's important: what's generating the embeddings is OpenAI. OpenAI has embedding models, or specific endpoints. This is using their older embeddings model, Ada 002. You send your text to the endpoint, it creates the embeddings, and gives those back to you.

So this actually creates the embeddings. It reads the files, generates the embeddings, then inserts the documents into the DB and then queries it with this query embeddings. And that just uses the query embeddings DB thing we already saw. And this is the actual command itself: a create embeddings command and a query embeddings command.

So let's just run this, run those two commands. We'll see what happens. First we're going to clear this up. This feeds in the files, so create embeddings with SQLite from content, creating the vector database, initializing the database. That embeddings DB creates the table. It says it found three files, which makes sense. You got three files here. That loads them in, generates the embeddings, and then inserts three documents, and then they're all created and ready to go.

Then you'll have the query command with a prompt that includes the query you want to ask. So I just asked, "What is FSJam?" because that's what it talks about in here. And then here we see we got the rank scores. We have an FSJam file which talks about FSJam that has the highest similarity, 0.85. We then have these files that talk about Cloudflare and Vector DB, and those are smaller similarity, only 0.7 or 0.67.

So you can see it does have a quick gut check. It seems to be aware of that. It wants to answer a question that I'm asking about FSJam, so it needs to find the file that is most relevant to FSJam. So yeah, that's pretty much the whole flow.

[00:17:41] So, questions based on that?

00:17:48 - Dev Agrawal

Yeah. So you mentioned that you first want to split up the initial documents you have into chunks. You said you don't want to get into that. Is that a complicated process?

00:18:02 - Anthony Campolo

The reason I say I don't want to get into that is I don't really have a whole lot of good information for you in that regard. When I first built this, OpenAI's embeddings endpoint itself has a limit on what it can accept. If you try and give it a really long file, it will just break. I ran into this with Ryan's show notes because he'll do a five-hour stream. That turns into half a book of text by the time you're done with that. So when I first did it, I would just run it, it would fail on a bunch of files, and I would then manually create a new file, cut it in half, and put it in. Both files would be half as long.

So really what you're going to be doing is writing some code that's going to go through all your documents and slice them up in some way so they're smaller. Then when you feed it to the embeddings, it'll fit in whatever length limits there are for your embeddings models.

[00:19:08] So how long they need to be will depend on how you generate your embeddings. Once you've done that, the question I still have, and I don't have the answer to, is how do you make sure it understands which chunks came from the same documents versus not? Maybe that's not even an issue. Maybe if it can just search through all the text, it looks for the information it needs and cross-reference it, and it doesn't matter where it originally came from. Or maybe you have to point it some way, give it some metadata so it understands which file it came from. That's one of the things I still need to figure out.

When I first generated this tutorial, I had it generate more work that included a smart document chunking. I didn't have time to actually put this in. But if you see here, let's see, chunks marked down by sections. It looks like it kind of goes through and chops it by different markdown headings. So it's going to look for heading twos and heading threes.

00:20:15 - Dev Agrawal

So literally parsing the markdown AST. Nice.

00:20:19 - Anthony Campolo

Yeah, right. And you could also just turn the markdown file into text and chop it by number of lines, chunk by sliding window. There's something called sliding window that I've heard people mention before. So what this has to do with is having overlapping words. If you chunk this in a way where you do it naively and it ends in the middle of a sentence, and then the next one starts in the middle of a sentence, that can break the context. If it's searching and it can only find one or the other, then both of those sentences might make no sense without the other half. So you can do this where you have some overlap. That way, if it's reading a sentence, it won't run out before it can get to the end of the sentence. And when it starts, it won't start in the middle of a sentence.

Then it's a question of how much overlap you want, because every time you do, you're putting in duplicate text. So you may run into issues if you have a lot of text and you make too big of an overlap window. But that's something.

So I think that might have been. And then, you know, updates the commands and stuff. So yeah. So that's probably the thing about chunking. It doesn't really sound that complicated, honestly.

And then let's see what else I had here. There was a section for hybrid search module.

00:22:03 - Dev Agrawal

And so you have two different searches and you combine the results.

00:22:09 - Dev Agrawal

Yeah.

00:22:10 - Anthony Campolo

Yeah, I think that's what's going on here. Sorry, I'm just gonna scroll through here real quick.

00:22:25 - Anthony Campolo

Yeah. And then that had a part where I actually create like a chat thing. Okay. So if we don't have more questions about that, I can show the AutoShow version of this.

00:22:34 - Dev Agrawal

Yeah, let's do that.

00:22:36 - Anthony Campolo

Okay. So let's go. Host. All right, so this is the new and improved AutoShow. I got these couple show notes here, basically just episode summaries of some FSJam episodes. And then we got your AI-powered search. Let me get this up so we can see what's happening. Great. So we're going to create the embeddings, and this is using the Cloudflare Vectorize thing.

So you see here, generating embeddings for show notes. It looks like Vectorize has chunking sort of built in. You can see here it only created one chunk for the transcript because these are just test episodes that are only a minute or two long. Then it generates multiple vectors based on those. It'll tell you the text length and everything. And then once it's done, it'll say it generated eight vectors for show notes.

[00:23:50] The reason why I think this is for [unclear] is because right now that transcript creates a show note, and then the show note creates a show note. That's something I need to fix. Then you have a Workers endpoint that you're referencing, so it all goes through a Cloudflare Worker. After that, it generates that.

Then you can ask it a question, and it will fetch content from the show notes, some context, 5,000 characters, and then it gives you this. It gives you the answer, and then the sources of where it came from. There's only two show notes right now, but it's pointing to each of them and the transcripts so you can reference the answer back here. So yeah, that's pretty much the whole thing.

I need to make some updates, like have it only generate new embeddings if there are new show notes since the last time it generated embeddings.

[00:24:50] But yeah, that's kind of the simplest chat interface thing: just a button to generate the embeddings, then an input to include your question. If you think about the CLI, there was the create embeddings command, which is like clicking this, and then the prompt text that was in the command is like the input box text.

00:25:19 - Dev Agrawal

Nice.

00:25:22 - Anthony Campolo

And then here, I can show you, there's this worker which handles inserts. So we got these vectors, insert the vector, and then handle the query. So this will...

00:25:49 - Dev Agrawal

Yes, this is live.

00:25:52 - Anthony Campolo

So it looks like this is giving the top ten examples here. So it looks like someone in the chat...

00:26:01 - Dev Agrawal

Yeah. Someone's asking, "Is this live?" What's up?

00:26:05 - Anthony Campolo

It is.

00:26:06 - Dev Agrawal

Yeah.

00:26:07 - Anthony Campolo

Yeah. The newest is at Dev AutoShow app. This is what I'm showing right now is not going to work because I need to fix something with Cloudflare, not knowing whether it's in local or production. You can use the rest of the app, though, but you should go to Dev App, which has the newest. If you just go to AutoShow app, it has the old app still, which should not be used right now, really. I'm going to be merging it in hopefully later tonight.

But yeah, sorry. Who's in the chat? Matt [unclear]? He has a specific question. Hit us with it, man.

00:26:48 - Dev Agrawal

Scott's been putting tips in the private chat, like ChatGPT.

00:26:54 - Anthony Campolo

Okay, so when you attach metadata to each chunk, it's usually stored alongside the embeddings. You give it information so it can reference where the source document is from. Like we were talking about, when you retrieve a matching chunk, you can reconstruct where it came from. Group multiple results by document ID to assemble full context. Bonus: fixed-length sliding window, 500 tokens with 100-token overlap.

We talked about semantic chunking, breaking at logical semantic boundaries like paragraphs or headers. We kind of looked at that in the chunking implementation that was written for it, and then chunk overlap to preserve context across boundaries.

Yeah, cool. That's good stuff. Okay, so here we got: "I'm making a program basically, which has a pipeline that takes in some audio using DLP downloads, content demux for isolating the voices and separating speakers." Yeah, that's the challenging part. "Then I'm going to be using Whisper and [unclear] to get the latent space of..." But that last part I don't understand. I read this paper about [unclear].

My first thought is that if you really need to separate speakers, you shouldn't use Whisper. You should use a transcription service that provides that to you, like Deepgram or AssemblyAI. There is a Whisper diarization Python package that I would recommend trying out if you're trying to implement Whisper speaker diarization yourself, because the people who've tried to do it seem to think it's pretty hard. No, I mean, prosody is a term that has to do with how someone's voice sounds, I think.

So let me pull a couple of things here. Let me go back to sharing my screen. So this here, Whisper diarization, my first piece of advice would be try this. If this works for you, it's going to totally solve your problem. I ended up pulling this out of my project because it was really, really slow. It would take over an hour or two to do a three-minute thing because it had to do this whole crazy Python dependency build startup kind of thing every time you ran it. I just had no idea what to do about that.

But it did work, though. It eventually would give me the output with the speaker labels. So this might be what you want to look at. And then, like I mentioned, AssemblyAI and Deepgram, they both give you, almost every paid transcription service you're going to get will come with speaker diarization. They have something called multi-channel, which is what you're talking about when you have multiple audio channels.

Let me look up, I think, the quality or sound of the voice. Prosody: the study of all elements of language that contribute toward acoustic and rhythmic effects, chiefly in poetry but also in prose. Other rules determine the length or shortness of a syllable, syllabic quantity, various combinations of short and long syllables, and the study of meter and its use. Okay. Interesting. Yeah. I'm not sure exactly how it would relate to a transcription service. I guess it would be that certain people talk with certain rhythms, or some people are talking in a more sing-songy kind of way. Let's see what we got here. [00:31:06] - Dev Agrawal Yeah. Maybe you can use the style of speaking as a way to separate them by voice.

00:31:13 - Anthony Campolo

I want to vectorize prosody. I found the podcast a while ago. Interesting. Which podcast are you talking about? It's for voice cloning? Okay, so do you not have an audio recording of just the single voice, and you're trying to extract that out?

00:31:45 - Short split/interjection

Go back now. Wait. Hold on.

00:31:48 - Anthony Campolo

Voice cloning, using the prosody to call vector DB calls. Interesting. Okay. I mean, the easiest thing to do is just try to get a clean recording of each individual person and then use that for the voice cloning. That's basically it.

00:32:07 - Short split/interjection

[unclear]

00:32:08 - Anthony Campolo

Okay. Yeah. If you could get clean recordings of both, then you could create your voice clone of each and then you could just have each one kind of labeled. This is totally outside the bounds of what I have done or worked on, so I don't have a ton of great advice for you beyond trying to find a way to do this without having to build it all yourself, because it sounds like a kind of insane project for an AI call center salesperson. Okay, then they should be able to just sit people down, have them record like a 32-second snippet, and then you could identify the voice that way. I just don't really have enough information about your current setup and issue to give super targeted, specific advice here. What about you, Dev?

00:33:07 - Dev Agrawal

I would love to clone my voice and have it attend meetings for me so I don't have to.

00:33:14 - Anthony Campolo

Easy to do, actually. There's services like ElevenLabs that have voice cloning. There's open source voice cloning. I've gone very deep into text to speech. Yeah, you're doing fine, man. I don't know, if you want to DM me with some more details, feel free.

00:33:32 - Dev Agrawal

Thanks. Scott, what have you been doing recently with RAG? You've also been diving into AI.

00:33:47 - Scott

Yeah, I'm building an app right now. Actually, I'm on a different computer right now, and I was trying to pull it up. I pulled the repo in, and I'm putting my environment variables in right now so I could show you real quick. But the live site's broken because I'm in the middle of working on some stuff on it. The dev environment is working, so I was trying to get that pulled up. Give me a second. Let's see. Okay. So, yeah, basically I've been building an app. Hold on, I gotta get this stupid reCAPTCHA.

00:34:29 - Anthony Campolo

I've been saying that it's AutoShow for churches.

00:34:33 - Scott

Yeah, basically what you can do is take a sermon or whatever someone's preaching. A lot of churches record their stuff, right? You can upload it and it breaks down and gives you an outline, kind of like AutoShow. And then it gives you something a lot of people who go to church want: devotionals for the whole week. So it would increase engagement throughout the whole week for the churches and for those wanting to dig more into the sermon they heard on Sunday. They can learn more about it throughout the week and continue to grow. That's what it does. Right now, that actually does work. My stack is Supabase for the back end, and then front end using Netlify for hosting.

[00:35:32] And then Matt chimes in.

00:35:34 - Anthony Campolo

Says AI theology, respect.

00:35:38 - Dev Agrawal

Yeah. I'm gonna replace pastors with AI. That's fine.

00:35:42 - Scott

No, not at all.

00:35:45 - Anthony Campolo

AI can only replace God. He can't replace the pastors.

00:35:48 - Scott

That's terrible. Only enhance things, right? It's just a learning thing to help people learn as well. That's what it does currently. I'm also making it so it'll create study guides and stuff for a lot of people in church. If they're involved, they want to have small groups where you have a group of 5 or 6 families or something like that come together and do a study on a certain topic for six weeks or something like that. That's something you could also do. It could go through all the old sermons and, kind of what we're wanting to do with a vector thing where basically it has all these old sermons, we throw them in and vectorize them, and then enable someone to type in a custom-built study guide.

[00:36:44] Right? Like, "Hey, I'm having an issue right now," or, "I want to learn more about this certain topic." And so they type that in, and then it goes through all these old sermons and determines what pieces from these sermons are going to be good for that particular topic they're wanting to learn more about. So that's one of the ways I'd like to use vectorizing and things like that. But yeah, it's pretty cool. That's awesome.

00:37:19 - Dev Agrawal

Yeah, I was considering building something like that for folks back in India because they're really into astrology, things like that.

00:37:31 - Scott

Sure.

00:37:32 - Dev Agrawal

Build an AI assistant for that kind of thing. It's just like, yeah, personally, I'm far from it, but it's definitely a way to bring a lot more people into the AI world.

00:37:45 - Scott

Yeah. And honestly, I've thought there'd be a lot of pushback just because it's AI and you're dealing with...

00:37:57 - Dev Agrawal

Sensitive topics.

00:37:58 - Scott

Yeah. Sensitive things, right? And so a lot of people are kind of wary of it. "What about hallucinations?" I've heard that one. And it's like, well, hallucinating has gotten a lot better, actually.

00:38:13 - Anthony Campolo

Less of an issue when you start from a source thing and tell it to stick to that.

00:38:16 - Scott

Yeah, exactly. It's going back to the sermon.

00:38:19 - Anthony Campolo

I had the same thing.

00:38:19 - Scott

People show up. And if you have really good prompting on the back end, then you can usually avoid hallucinating for the most part. I think that's important. It is very important. The other thing is my system has it set up so the person who does the upload, whether it's a team member or the pastor themselves or whoever, has the ability to see the outcome. Then they can click on edit and change a sentence or word, like, "Oh, I wouldn't say that, so I'm going to change it to say this," or, "I wouldn't put that out to the members." They can do all that before clicking publish and then making these actual documents to be able to send out and stuff like that. So I have that set up as well.

00:39:12 - Dev Agrawal

Yeah, it's like the same argument people make that AI writes bad code, terrible code, and we cannot push it to production. And it's like, we have code reviews. If you use AI to generate code, it's still your responsibility to make sure it works before you submit a PR and another human reviews it and merges it.

00:39:33 - Scott

Right, make sure security is up to par. I've learned a ton about security and rules and all that for the Postgres stuff inside of Supabase, policies, and everything. The other thing is AI, ChatGPT, whatever model doesn't matter, they actually really aren't that great with rules and understanding that whole policy piece. But the cool thing is I was inside of Discord and talking with someone inside of Supabase, and they had done a couple things having to do with roles and some functions involved with including roles so that you can kind of avoid roles but not at the same time, still be secure. Using these different methods and wrapping them in things like select, it's like that's a Supabase thing specifically.

[00:40:35] Right. But really enhancing the performance of things too, as well as the security of it, and coming out the other side without creating... because I kept running into issues with infinite recursion and stuff like that, so I had to learn how to avoid all that and do all that with this. And it's really cool, though. It didn't know, but I was able to take this guy's repo and say, all right, here's what he did. Look at this repo and shove it in there. And then it's like, oh, that's a method. I'm going to use this method. And I have to remind it every now and again to go back to it. But that would be a really cool idea that I'm thinking about now: create a vector for your own workflow, throw in someone's repo, and so every time it goes to create something, it can check over this vector DB to make sure that it's using these different methods that you need and have to use in your process.

[00:41:34] That would be kind of cool, actually.

00:41:38 - Dev Agrawal

Cool.

00:41:40 - Dev Agrawal

Do you guys have, when the AI provides a response, do you have it provide references of what document or what piece of text it used in line, or maybe later as a citation like these are the five things I looked at, these are the five parts, or the specific parts of the text that I was referencing? Because that seems like it would be an important thing to include there for someone who might be skeptical of the response. Right, okay, this is exactly where it came from. You can go and read it from the source.

00:42:17 - Scott

Yeah.

00:42:18 - Dev Agrawal

How do you achieve that?

00:42:19 - Scott

I guess specifically with what I wanted to do with the sermon stuff, that's going to be huge for scripture referencing stuff. Exactly right.

00:42:31 - Dev Agrawal

You already do that where, in between talking, someone will just reference a random Bible phrase and then continue talking.

00:42:38 - Scott

Right. So they already do that. They kind of reference it themselves. And so I'm able to pull that out. And actually, yes, in my outline it shows reference scriptures. And then I want to make it actually where you can choose the version of the Bible that you want to see, because everybody, there's a lot of different versions out there. And so, you know, like the Catholics have their own version and then there's a lot of non-denominational stuff that had different versions, but to where they can select the version. And there are actually some really great Bible APIs out there with all the different versions in it. So you can just pull from that. And I want to be able to make that a piece of it too.

00:43:16 - Anthony Campolo

What model are you using? Or I had a question in the chat about that.

00:43:21 - Scott

Me particularly? I'm sorry.

00:43:22 - Anthony Campolo

Yeah.

00:43:23 - Scott

Yeah. As far as the model for the LLM I'm using, and actually this needs to be improved, I was talking with Anthony about this a couple of days ago.

00:43:33 - Anthony Campolo

Yeah. You're using GPT-4.

00:43:34 - Scott

Yeah. GPT-4. And it's so freaking slow and it's terrible. It takes a whole minute to create an outline in some devotionals. It's terrible. And I'm talking to Anthony like, man, I really need to improve this speed-wise and all that because I did make it so the UI is very friendly, the UX for the user. I wanted them to not just have to sit there for three minutes twiddling their thumbs, wondering if it's broken or if things are happening. So I don't just have a progress bar. I have a progress bar for the loading. I've got a whole other screen that pops up where it shows the different stages that it's going through, like, okay, it's transcribing, okay, the outline is being generated, okay, the devotionals are being generated, and it walks them through that process so they can see step by step that something is happening.

[00:44:19] So even if it is a minute or two minutes, that's a long time for people nowadays, right? They want instant gratification. So I think it's important to have those UI things in place as well, not just everything set up that way too. I do have that in the process. It needs to be fixed, though. It's not completely working as it should. As always, there's some sort of bug here or there, right? We gotta work through things. But I'm learning to enjoy that process more and more, of debugging things. Usually I can debug something pretty quick, obviously with the help of ChatGPT and stuff like that.

00:45:05 - Dev Agrawal

Like Cursor and stuff yet?

00:45:07 - Scott

Oh, yeah. So actually, when I started this journey back in May, deciding I'm going to build this app, I'm going to do it, someone mentioned to me Bolt.new. This was before their huge marketing thing they did for the last 30 days or whatever, the hackathon they did. It was like $1 million or something. I was even going to sign up for it, but I was like, ah, I'm just going to focus on what I need to do right now. I'm not going to do this. But anyway, I signed up for Bolt and I used their paid plan because, over the last 30 days, I used like 27 million credits. So yeah, I was using the hell out of it. And one thing I found was that Bolt did obviously have limitations.

[00:45:55] Right. And so I would actually work between two different things, mainly. So I have my own ChatGPT account, obviously, and I would use like 4 or something like that to help me, or whatever, if I ever had an issue that Bolt was not solving for me. I took it and I threw it in and explained it. And so I had an ongoing project inside of there so that it could have reference points to understand what the hell I was talking about with the context and everything. And sometimes it would give me a much better answer than utilizing Bolt. So I would take that answer and inside of ChatGPT-4, I would say, okay, now you gave me this perfect plan, and then I'd give it this prompt and say, create a prompt for me to have Bolt.new implement this for me. Then it would create the prompt for Bolt, and I would throw it in Bolt, and it would solve my problem.

[00:46:56] Right. So I did that a lot. And then it was a lot of working back and forth between different LMs and different sources, different things, which is fun. But then another thing was having a dev environment inside of Bolt that is almost impossible. It's really hard to make that a reality there. They are improving some things, like you can open up a new branch from within there, and so you can create a dev branch or whatever you want to do, which is cool. And then I started to get comfortable enough because I loved the visual piece of Bolt.new, the ability to click inside of the web browser there and click on a specific element or click on, you know, it just was so simple and intuitive to be able to click on that and say, actually, hey, click on this.

[00:47:48] And then I would tell it, add a button for this or change this, without breaking the current whatever. And then it would be like, okay, well, here's the plan. Do you want to implement it? Yes. All right. Go. And I really like doing that, to be able to build visually. But I kept running into issues with Supabase, and one time, two times I broke the crap out of stuff and I thought, oh my God, it's over. Like, I just did a month and a half of this for nothing. Like it's done. And I'm like, I really need a dev environment, right? Because I was messing with production most of the time. So I went ahead and hacked my way into doing some dev environment stuff, and then I said, you know what?

[00:48:33] I'm going to go and mess with Cursor some. It's changed since I last looked at it a long time ago. When it first came out, it was more geared towards developers versus someone who's a non-coder. And I'm like, you know what? I'm going to try and dive into this a little more now. And so I already actually had paid for Cursor for the year. I just wasn't touching it. And so I'm like, yeah, I know it's hilarious. But I was like, you know what? I'm gonna do it. So I downloaded Cursor and started messing around in there. And now for the last three weeks, actually, I've pretty much been using Cursor 90% and like 10% Bolt. And now actually Bolt is having such an issue with connecting to Supabase and some of my stuff, like it won't even connect anymore. And so it doesn't have that knowledge with it. And so now I know how to use the CLI in Supabase and Cursor.

[00:49:25] And so I'm just using that like a normal developer would. And it's crazy. I went from non-developer, kind of knowing some things, you guys know I know a little bit, and using this new method to gradually growing into using full-blown Cursor. I don't do a lot yet.

00:49:49 - Anthony Campolo

Pause real quick because there's been a whole bunch of comments. So Matt was saying, "I started using Grok. Llama 3. It has 300 calls per second. Inference is immediate." Yeah. Grok with a Q, not a K, has the fastest inference by far. He was using DeepSeek V3 because it's cheap, but Llama is surprisingly good for retrieval and answer. DeepSeek is so slow. I know this is one of the reasons why they can make it so cheap, because they take forever to give you a response, right? Scott, lock in.

00:50:18 - Dev Agrawal

V0 Gemini canvas.

00:50:20 - Anthony Campolo

Is all you need.

00:50:21 - Dev Agrawal

All you want me to use Vercel? What is going on?

00:50:24 - Anthony Campolo

And Claude code to connect up all of it. Lock in. Also don't use Supabase, just use Convex locally and host on a backend as a service. Sucks. But if you do it yourself, you're cooking, Convex brother.

00:50:39 - Dev Agrawal

You're probably right. Convex.

00:50:40 - Anthony Campolo

Hold on. Let me get through all that. We'll talk. Or V0 or Gemini Canvas. Make React components. Specifically outline it to have connections for API calls you need.

00:50:49 - Scott

Yeah. Absolutely. So that's one thing too. As I improve the app and move forward with things, I am refactoring things. And actually Cursor is really good at refactoring. If you make it come up with a plan and follow that plan and go through without breaking stuff, it's really impressive how much that thing can refactor in your codebase. And sometimes it'll miss a little bit of something, but you can make it go back in and fix it. So it's pretty cool.

As far as V0 goes, I see some of those people in there. Maybe you're just huge V0 Vercel people. I don't know, I'm not. Obviously, I'm not using Vercel right now. And the only reason why I was using Netlify, to be honest with you, is because Bolt.new has a direct integration with it, just like they did with Supabase. And it was very easy to use. I don't have to pay too much for it.

So the only thing I'm paying for is Supabase. Actually, I'm paying for the Pro because of certain features that I needed. But going back to the comment, I don't know which one. That was Anthony that talked about using and hosting it yourself or whatever it was. What was that with?

00:51:56 - Anthony Campolo

Yeah, he was saying use Convex, and also Dev had some thoughts on Convex.

00:52:01 - Scott

Yeah, I saw Dev. Dev was like, yeah. So speak on that, maybe, Dev, and we'll go from there.

00:52:06 - Dev Agrawal

Yeah, I mean I know Convex is really, really nice. I've been using it for a couple different projects. Definitely don't miss other sort of databases that I worked with. It's really nice to just write a query and never have to worry about refreshing or revalidating, because everything just updates in real time. I mean, it's basically a sync engine, and it comes with all the benefits that sync engines come with, or most of them at least.

00:52:33 - Anthony Campolo

Which is funny, because it was originally gonna be the topic for this stream, so we'll have to talk. Definitely talk about Convex when we do talk about it. So is it kind of like a database, but it includes, I'm not saying this is literally what it is, but in my head it makes sense as a database that also has kind of an SDK attached to it that gives you nice syntax and also handles things like real-time updates.

00:52:56 - Dev Agrawal

Yeah, it's not necessarily an SDK. It's more like their own runtime. You can almost think of it as writing stored procedures in SQL, where these are SQL queries that will run in your database. They won't run on your server that connects to a database over the network. These queries will literally run in the database server itself. So that's what Convex queries and mutations do. They literally run in their database. But instead of writing SQL you just write JavaScript or TypeScript. You just write regular code to access the data. And if any part of the data changes, it will know which queries to rerun and then send the updated data immediately to the clients. [00:53:37] - Scott Interesting. So when it comes to functions, because I use a lot of functions inside Supabase for doing certain things, there are sometimes memory issues where you run over that. So how does that work with the compute and Convex and all that stuff?

00:53:58 - Dev Agrawal

I mean, they have their own pricing for how much compute or data you're looking at. But basically, they wrap it. The APIs they provide to access data will make sure you are connected. If you have things like Supabase functions where you're trying to connect to a separate Postgres database, you need to have a stateful connection with the database. So every time you have a function, you first open a connection, then you start the query, and then you have to close the connection. A lot of that, if Supabase doesn't manage it for you, then you have to work on it yourself. You have to make sure there is PgBouncer or some connection pooling somewhere to remove those issues. Convex just kind of works out of the box.

[00:54:52] I've never really worried about memory issues with it.

00:54:56 - Scott

Okay, interesting. Because I have run into memory issues based on the functions and things like that. But another question then is, when you've got all this in there, it almost sounds like compute at the edge or something like that. Yeah. Yeah, yeah. So is it like CDN, where you've got this in multiple regions, or is it in one region only? How does it work with that?

00:55:22 - Dev Agrawal

Yeah. I mean, it's one region only. They probably have some sort of replication built in so you can scale across if you want. But basically they have an open source version as well that you can self-host. So it's like you're running a Postgres container, but you're running a Convex container, and you can also run your code inside their database server. And there are probably ways to replicate for better scale. So if you have lots of queries, then you can have multiple servers that handle that. They don't do anything on the edge yet. But because they're open source, you can probably just deploy them to Fly.io and have it work on the edge. But then you'll have to worry about, you'll have to figure out, if you have data in multiple places as well, then how does that get stored and synced. So that's where their platform comes in. You can scale as much as you want, but you can also just use the open source self-hosted version, which they didn't have for a while.

[00:56:24] So I'm really happy that they decided to make that a thing.

00:56:28 - Scott

Interesting. Yeah, that sounds like something to look into. I just really like how. Yeah.

00:56:39 - Dev Agrawal

The one thing that is now pushing me away from Convex a little bit is that I almost want the AI to write the queries at some point and not me. And in that case, it's probably better to just use SQL or something that the AI already understands very well. But that's something we can fix with context engineering, right? Just make sure it knows how to write queries and then...

00:57:07 - Scott

Right, hook up the docs to it, and all that good stuff. Yeah, yeah, that would be the difficult thing though, because that's one of the big problems I had with Supabase, creating these policies with rules, and I just didn't know that well about it. Right. It knew Postgres real well. It just didn't know the rules policy stuff. Yeah.

00:57:26 - Dev Agrawal

So I've been dealing with this a bunch because every time I try to get AI to write Solid, it just ends up writing React.

00:57:35 - Scott

It's like, no, I'm going to switch it.

00:57:36 - Anthony Campolo

That hasn't been an issue for me at all.

00:57:40 - Scott

Uh.

00:57:41 - Dev Agrawal

I probably need to get better at giving it context on documentation and specifically teaching it how Solid is different from React. I think that's going to be a big part of providing it context.

00:57:56 - Anthony Campolo

I mean.

00:57:57 - Scott

Ryan just put Ryan's name in every prompt. It'll be like, oh yes, I know that.

00:58:06 - Dev Agrawal

But yeah, the thing you mentioned about, like I was also doing it with Cursor, where I first used ChatGPT to talk about what I want to build, come up with a plan, and then ask it, hey, I'm going to build this with Cursor, give me a prompt that I can just go and plug into Cursor. That's been really helpful that way. I use Cursor for a decent bit. Convex has their own new kind of thing called Chef, which is actually really good because Convex offers all the back end pieces. It's on its own. And you only have to write code to create a new function, create a new cron job, or new database schema. So Chef can write all of that for you and create a fully functioning backend.

00:58:46 - Scott

That's why I loved Supabase so much because Bolt is connected right to it. But now that it's having such issues, and one thing that Bolt still has problems with, when I'm coding, sometimes it'll be like, okay, I'm going to do that, and then it acts like it is still in conversation mode instead of [unclear] mode. Yeah. And it starts writing code out. I'm like, no, you're wasting my time here. What the heck are you doing? Yeah, just push it. But that's cool with the Convex thing. So wait, they have their own version called Chef?

00:59:20 - Anthony Campolo

Yes, sir. It's powered by Bolt, so they're adding their own Convex stuff on top. It says it has one click deploy and live previews and even built-in emails with resends. This sounds awesome.

00:59:32 - Dev Agrawal

And auth as well.

00:59:34 - Scott

Okay, interesting. So I wonder if it will understand more how to create that back end because they probably feed their baby into it or something.

00:59:46 - Dev Agrawal

Definitely. Yeah.

00:59:47 - Scott

Yeah.

00:59:47 - Dev Agrawal

And they also don't have RLS stuff. They just have queries that run on the server, which means you are basically in charge of writing the auth rules in your logic, in your query logic.

00:59:59 - Scott

Oh, and we know Dev loves to roll his own auth, so. Excuse me. This is your dream come true. So I mean.

01:00:12 - Dev Agrawal

Yeah. So the other thing I've been wondering is that I feel like both Bolt.new, as you mentioned, which is for more non-programmers and non-developers to build things, and then Cursor or Claude Code. These are really developer-heavy things. I feel like there's probably something in the middle that is valuable for most people, but we haven't yet figured that out, figured out what that middle ground looks like. This is not like where something can understand the specific stack and build out a big chunk, but also give you fine-grained controls like something Cursor does. I don't know, maybe it's like just a black and white thing and we can't really have something in the middle. But I don't know. That's something that I've been thinking about a little bit.

01:01:10 - Scott

Yeah. No, that's cool. And maybe there's definitely a $1 billion idea there, but who knows at this point. Obviously, I've gone on both sides, right? And I am definitely enjoying Cursor now that I have learned more about utilizing super basic CLI and stuff like that, and yeah, maybe there's a middle ground there that's good for everybody.

01:01:43 - Dev Agrawal

But yeah, since we started talking about agents, I want to try to connect it back to the initial RAG stuff. Yeah. So basically, I've been hearing a little bit about agentic RAG. It's a fancy word, but it just sounds like we're doing multiple vector searches or multiple queries in a loop, right? Is that something you guys have done, like a problem where we don't just want to do one search and provide the context, but maybe the AI decides that, oh, I need more information, go back to the database and do it multiple times before coming up with a result. Has that been something you've done?

01:02:25 - Anthony Campolo

No, just because I'm not doing any agent stuff within AutoShow, and the agent stuff I've done hasn't been working off of a big enough data source to need that. So sounds interesting though.

01:02:38 - Scott

That is interesting. I'll tell you this much. Whenever I do some sort of audit or refactoring within Cursor, I will tell you that I see that it does a search through a file more than once sometimes, and that's kind of interesting. It's kind of doing what you're saying. So yeah, maybe Cursor should touch on that.

01:03:01 - Dev Agrawal

Yeah. I mean, I think Cursor probably does it because it tries to minimize the context. So if a file is like 500 lines long, it'll look at the top 100, then it'll realize that, oh, what I'm looking for is actually down here, go there, and then get small chunks multiple times. Basically the chunking idea. So it just does it on its own.

01:03:23 - Scott

Interesting. Yep.

01:03:26 - Dev Agrawal

Yeah. But I'm wondering, like, okay, maybe let's say you find a sermon, or maybe one or two that relate to the question, and then one of them references some Bible phrases that are not included in the sermon itself. So maybe the AI decides that, oh, I should also get these Bible phrases and where they came from.

01:03:51 - Scott

Totally want to do something like that, especially for the study guides and things like that, where it can pull in other scriptures and reference other things that are going to really help to expand or make it better for the end user. Right? Yeah. That's good.

01:04:08 - Dev Agrawal

Yeah. So something I've been doing locally, I've been using open code, which is the Claude Code competitor that Dax and crew are building.

01:04:19 - Scott

Uh, yeah, I heard about it.

01:04:20 - Dev Agrawal

Yeah. What I did is that I gave it a SQLite file, SQL database, and I gave it complete access. Do whatever you want with it, change the schema however you want. You can write whatever queries you want. But this is basically your memory. And so now, when I ask it to do things, basically it's responsible for keeping track of everything in that database. So it goes over, and because it's SQLite, it can just run a SQL query on the shell on the terminal. So it just writes a bunch of different queries to interact with the database. Sometimes I tell it, like, hey, this is something I was working on a few days ago. I want to continue that. And so it can just write a query, get the results from SQL, realize that, okay, this is the thing that we want to work on, and maybe fetch more context from the database.

[01:05:12] So this is kind of like the.

01:05:14 - Scott

Oh.

01:05:15 - Dev Agrawal

Yeah. I should definitely add vectors to this at some point. But right now I feel like, first, obviously my data set is not too big right now. It's like probably 20 rows. I just kind of started this project. But also you can do vector search in memory, right? You don't need a vector database. You can do it in memory. It's just not going to be super freaking slow and expensive on memory. But if you don't have a large enough data set, then you can just do it locally or in memory. But if every database now has built-in vector capabilities, then why bother?

01:05:53 - Anthony Campolo

Yeah. So Matt's saying, dev, are you rich using open code with all these APIs? So is open code expensive?

01:06:01 - Dev Agrawal

Open code itself is free. I mean, it can talk to basically any model that you want. So I've been using it with 4.1 mini a decent bit. I was going to mention that earlier when you said that the generation has been slow, because it sounds like something like 4.1 mini is really fast and should be sufficient for a job like that. There are some issues with using other models, because if you use Claude on it, it's going to be perfectly fine. It's really good at tool calls.

01:06:36 - Scott

Don't use whisper. Ever. Like, it's the worst thing in the world. Because that's what I used initially to try and do. Transcription at the very beginning for everything.

01:06:45 - Anthony Campolo

Very slow. Yeah. [01:06:46] - Scott And it was. No, it wasn't even that. It's the fact that it was a 25 megabyte limit on it. So then I had to chunk it, and then I was having.

01:06:52 - Anthony Campolo

Oh, you're talking about the API. Gotcha.

01:06:54 - Scott

Yeah. Sorry. The API, not direct, but yeah. Either way.

01:07:00 - Anthony Campolo

Comment in chat. This is referring to something Dev had been talking about, saying that you may need some sort of index with time or specific topics, such as like a tree. Yeah, we did hit on this during the stream.

There's something related to this that I wanted to show real quick. I did see that there is automatic chunking for Cloudflare's Vectorize. This is one of the reasons why Vectorize, I think, is actually pretty sweet, because it has this thing called Auto RAG built in that takes some of the harder parts of implementing RAG and just does it for you.

So you can control the chunking. You can control the overlap. We were talking about the overlap having something show up in two different sections. And then here you can kind of see the limits for Vectorize. You have a certain limit of indexes per account, stuff like that. This is getting started with Auto RAG.

[01:07:56] So this is something that I'll need to go look into more, because it sounds like this is already being used under the hood, which I wasn't aware of.

01:08:03 - Scott

A lot of those limits, though, are really just the vector limit. Like there's limits on chunking, right? With all that too. So the size, like that 1536 deal up there.

01:08:19 - Anthony Campolo

Yeah, yeah.

01:08:21 - Scott

Yeah.

01:08:22 - Anthony Campolo

And then there's one other thing mentioned. Matt mentioned this Jina embeddings v4. This is kind of interesting. It's something called universal embeddings for multimodal multilingual retrieval. So this can understand images and PDFs, not just pure text. It also sounds like it has a much larger context window and is kind of just like the hottest newest embedding. It just came out June 24th, 2025, so this seems pretty interesting. I might have to look into this a bit.

Anyway, Matt, if you have more information on that, feel free to drop it in the chat. Go ahead now. Cool. Thanks for hanging, man. Appreciate the talk. Best wish you all the best. Cool. How much can be the limit for the memory of such an agent using chunking?

01:09:24 - Dev Agrawal

I guess as big as the context size.

01:09:29 - Anthony Campolo

Yeah. I mean, I know some people, if they're using an agent, they'll use some sort of summarizing mechanism. So once you get too far through the context window, it will summarize and shorten it up. So that can be.

01:09:44 - Dev Agrawal

It does it automatically these days. Nice.

01:09:50 - Anthony Campolo

Cool. Actually, before I go, I put $50 into Jina. Use it yourself. You'll never run out of credits. Yeah. No, I'm going to check that out for sure.

01:09:58 - Scott

Mhm.

01:10:00 - Dev Agrawal

I guess the next question I had in my mind is that, like, whatever you guys have built so far, how well does it actually do at answering questions? What are some places where it still doesn't quite understand what it's talking about? Obviously, like things like Agentic RAG and Graph RAG have kind of caught my attention a bit. But I'm wondering, like, how far can you get with this current approach, and where does it start to fall apart? And you would have to look into more complicated ways.

01:10:38 - Anthony Campolo

I mean, I don't really see where it would fall apart unless you were getting to the point where you were just hitting limits on the database itself because you have so much stuff. But it's already designed to handle large amounts of text that the LM itself couldn't handle, so I feel like you'd be unlikely to hit that kind of limit in terms of how well it works.

I haven't really done enough kind of AB testing with and without it to get a good sense yet, because this is always something I was building in the background just for fun. And now it's kind of being brought into auto. My intuition, my theory, is that for some things the LM will just have enough world knowledge to handle. But for something like you're doing, or a specific company, that's going to be really important. If it literally cannot get that information anywhere on the internet or in a training data set, then this is the only solution.

[01:11:41] You know, also, I don't understand your question here. If the memory is the same as the context, what is the case? If you want to kind of reword that or expand on that...

01:11:52 - Scott

I think that's something about how long, how much memory. And then I guess as much as the context is, or something like that. I don't know, I can't remember. Yeah.

01:12:02 - Dev Agrawal

I didn't quite understand what memory means in this context.

01:12:06 - Scott

Yeah.

01:12:08 - Anthony Campolo

You can get two terabytes [unclear]. The database has information from the entire database, right? Which means better indexes.

01:12:15 - Scott

Yeah.

01:12:15 - Dev Agrawal

I mean, in that sense you can give it infinite memory because you just let it query whatever database you have. If you let it scrape the internet, then its memory is the internet, right? Firecrawl is very interesting in that space. Firecrawl, browser-based tools, all that. There's a bunch of people trying to make sure that the AI, that the AIs, can look at any web page and interact with any web page.

So if you give it certain tools to extend its memory, then you can keep extending its memory to whatever you want. But it's like saying that you can have a like, so memory. How we think about memory usually is that this is a low latency space where everything is in immediate context and you don't need to wait to get an answer. So if that's how we are thinking of memory, then it's the context, then it's not like a database. What's in context?

[01:13:17] Memory.

01:13:17 - Anthony Campolo

The storage.

01:13:18 - Dev Agrawal

Yeah, yeah. It's [unclear]. Basically the RAM size.

01:13:26 - Short split/interjection

Yeah.

01:13:27 - Anthony Campolo

If you got any more questions in the chat, feel free to drop them now. We're going to start winding down in a couple of minutes.

01:13:33 - Dev Agrawal

Yeah. Soon there will be scams for downloading more context size.

01:13:38 - Anthony Campolo

Download more RAM. I remember that.

01:13:47 - Dev Agrawal

Thanks. Cool. Yeah, this has been helpful. This is a fun project to be working on. And we are also trying to see if it can obviously call tools that do other things, like run a machine learning job on SageMaker or something. That should be fun, like a no code machine learning app.

01:14:11 - Anthony Campolo

I've never messed around with the actual ML part, like, you know, training a model or, you know.

01:14:18 - Dev Agrawal

Yeah, same. But it seems like ChatGPT put a lot of effort into sandboxing. And now Vercel offers that built in. Did you guys see the recent Vercel announcements? Twitter has been blowing up, especially after the Next Labs acquisition.

01:14:35 - Anthony Campolo

I saw the Next Labs one. I hadn't seen anything else.

01:14:39 - Dev Agrawal

Yeah. Basically, in the Vercel Ship event, they launched sandboxes. So you can give Vercel any file, any Python or JavaScript file, and it will run it in a secure sandbox on their own compute.

So now you can even have AI generate code and just have it run very easily. And ChatGPT has been doing that for a while, where it will run Python code to calculate how many R's are in 'strawberry' instead of trying to answer it itself.

01:15:12 - Anthony Campolo

Right? Yeah.

01:15:14 - Short split/interjection

It's pretty.

01:15:15 - Anthony Campolo

Sweet. I wonder if Peter knows about this, because this is one of the things he was talking about wanting to kind of do, I think. Yeah. I'll have to check this out. Vercel Sandbox: ephemeral compute primitive designed to safely run untrusted or user-generated code on Vercel. It supports dynamic real-time workloads for AI agents, code generation, and developer experimentation.

01:15:39 - Scott

It's like running up a VM so you can blow it up.

01:15:44 - Dev Agrawal

Yeah, exactly. Yeah.

01:15:46 - Anthony Campolo

Very insightful discussion. Cool. Well, thanks, man. Thanks for kicking. It sounds like the viewers enjoy the show. Awesome. Any links or places you guys want to promote or put out there? I know, Scott, your thing's not really out yet, so nothing to share there. But you want to share your Twitter?

01:16:03 - Scott

There it is at the bottom of the screen as Scott. Yeah. Cool. Right there.

01:16:17 - Dev Agrawal

[unclear], and I saw your finger the other way.

01:16:20 - Scott

Right?

01:16:21 - Anthony Campolo

That was on

01:16:22 - Scott

purpose. And there.

01:16:26 - Anthony Campolo

And same with Dev. He's got his handle on the screen there.

01:16:31 - Scott

Yeah.

01:16:31 - Dev Agrawal

It's either here or there.

01:16:33 - Scott

Or.

01:16:33 - Dev Agrawal

Yeah, my handle. And this is the company that I work for that lets me build cool AI and GraphQL stuff.

01:16:44 - Scott

So.

01:16:46 - Anthony Campolo

Yeah, you've been working on super cool stuff there. It's nice.

01:16:49 - Scott

That's awesome, dude. Awesome. So, hey, if you want to do more stuff, you should. Yeah. Yep.

01:17:01 - Dev Agrawal

And start using AI. AI skeptic people are nuts.

01:17:07 - Anthony Campolo

Yeah. Just be more nuts as the days go by. All right. Thanks everyone for watching. We wrap it up here, and we will catch you next time.

On this pageJump to section