
Pipedream with Dylan Pierce
Dylan Pierce from Pipedream demos AI-powered web scraping with Puppeteer, showing how devs can automate workflows and extract data without boilerplate code
Episode Description
Dylan Pierce from Pipedream demos AI-powered web scraping with Puppeteer, showing how developers can automate workflows and extract data without writing boilerplate code.
Episode Summary
In this episode of JavaScript Jam, hosts Scott Steinlage and Anthony Campolo welcome Dylan Pierce from Pipedream, a developer-focused workflow automation platform. Dylan shares his journey from ten years of software engineering into developer relations, then dives into a live demo of Pipedream's capabilities. He starts by explaining the platform's basics — projects, workflows, triggers, and serverless configuration options like memory limits, concurrency, and provisioned warm containers to eliminate cold starts. The conversation moves into Pipedream's newly released Puppeteer and Playwright support, which lets developers run browser automation inside serverless containers now that AWS increased Lambda size limits. Dylan demonstrates taking screenshots of websites, uploading them to Imgur, and then showcases the platform's AI-powered code generation, which uses GPT-4 with vector-embedded Pipedream documentation to write Puppeteer code from natural language prompts. A live attempt to scrape Twitter fails due to platform-level blocking, but scraping Reddit comments succeeds, illustrating how AI can help developers quickly target specific data without memorizing query selectors. The episode wraps with discussion of upcoming features like branching and looping in workflows, Pipedream's competitive positioning as a developer-first tool compared to no-code alternatives, and broader thoughts on how AI and serverless abstraction are reshaping integration work.
Chapters
00:00:00 - Introductions and Dylan's Path to DevRel
Scott and Anthony introduce the show and welcome Dylan Pierce, who works in developer relations at Pipedream. Dylan describes his background as a software engineer for about ten years before transitioning into DevRel, explaining that his love for meetups, documentation, and developer experience made the role a natural fit.
Anthony and Dylan discuss the differences between coming into DevRel from an engineering background versus a non-traditional path. They touch on how tools like Pipedream reduce the glue code that trips up less experienced developers, and how AI tools like ChatGPT are changing the game — though knowing how to prompt effectively remains essential. Dylan emphasizes that Pipedream's AI features are optional enhancements to an already fast workflow-building process.
00:05:24 - Pipedream Basics: Projects, Workflows, and Triggers
Dylan begins his screen share and walks through Pipedream's core concepts. He explains that projects function like repositories and can be connected to GitHub for version control, while workflows are essentially scripts composed of sequential steps — comparable to Zapier automations but built for developers.
The discussion covers the various trigger types available, including webhooks and cron-based schedules. Anthony and Dylan break down what a webhook actually is in plain terms, demystifying a concept that often confuses newcomers. Dylan also highlights configuration options like container memory, timeout settings, concurrency controls, and the ability to eliminate cold starts with provisioned warm workers — features that are especially important for latency-sensitive integrations like Discord and Slack bots.
00:11:17 - Serverless Configuration, Pricing, and Use Cases
Dylan explains Pipedream's VPC support for connecting to firewalled databases using static IP addresses and walks through the platform's pricing model, noting that the free tier is generous and only limits the number of active workflows and connected accounts. Enterprise features like VPC are reserved for paid plans.
The conversation shifts to practical use cases, with Dylan describing how Pipedream excels at handling the integration side-work that businesses need — like piping Zendesk tickets to Slack or syncing CRM data — without consuming developer time on the core product. He shares a fun personal project called "Dictionary," a Pictionary-style game using DALL-E inside Slack, built entirely on Pipedream, which illustrates how the platform shines when you only need backend logic and can rely on Slack or Discord for the UI layer.
00:17:14 - Puppeteer and Playwright Support on Pipedream
Dylan introduces Pipedream's recently launched support for Puppeteer and Playwright, explaining that these browser automation tools let developers programmatically control a Chromium instance to take screenshots, generate PDFs, fill forms, and scrape JavaScript-heavy single-page applications that regular HTTP requests can't render. He notes that AWS's increase of Lambda size limits from two to eight gigabytes made bundling Chrome feasible.
He demonstrates a simple workflow that navigates to his personal website, captures a screenshot, and uploads it to Imgur — all without writing any code, using only pre-built actions. The hosts discuss how this capability opens up possibilities like generating PR preview screenshots for GitHub comments, uploading to S3 or Cloudinary for image manipulation, and creating PDF reports from HTML pages instead of wrestling with PDF generation libraries.
00:24:29 - AI-Powered Code Generation with Puppeteer
Dylan showcases Pipedream's AI code generation feature, which uses GPT-4 with prompt engineering and vector-embedded Pipedream documentation to generate Puppeteer code from natural language descriptions. He demonstrates asking the AI to grab the H1 element from a webpage, and the system produces working Puppeteer code without the user needing to consult any documentation.
The hosts discuss how the AI knows the context is Puppeteer because the documentation has been crawled and stored in a vector database, ensuring accurate and relevant code generation. Dylan shares how he used this approach to build a Reddit scraper after the API became prohibitively expensive, simply pasting HTML snippets into the prompt and letting the AI figure out the correct query selectors for extracting comments, upvotes, and usernames.
00:32:36 - Live Scraping Attempts: Twitter and Reddit
Dylan attempts a live, unrehearsed scraping experiment. He first tries to extract tweets from Anthony's public Twitter profile by inspecting the DOM for data-test-id attributes, but the attempt fails — Twitter appears to block requests from AWS IP addresses entirely, returning zero content to the headless browser.
Undeterred, the team pivots to Reddit, where the HTML structure on old.reddit.com proves far more scrape-friendly, with descriptive data attributes for comments, upvotes, and reply counts. The AI successfully generates working code that extracts comment text from the JavaScript subreddit. The contrast between Twitter's aggressive blocking and Reddit's accessible structure highlights how platform policies dramatically affect what's possible with browser automation tools.
00:42:28 - Natural Language Prompting and GPT-4 Vision Ideas
The hosts experiment with whether non-developer language can drive the AI code generation, finding that without technical specificity about HTML structure, the AI falls back to incorrect approaches like using Axios instead of Puppeteer. This leads to a discussion about how developer context remains essential for reliable results.
The conversation takes an imaginative turn as Scott describes using GPT-4's image processing capabilities — taking screenshots or PDFs of web pages and feeding them directly to the model for analysis. Dylan realizes this could be automated as a Pipedream workflow: Puppeteer captures a screenshot, uploads it to OpenAI's vision API, and the model interprets the page content. The idea sparks excitement about combining browser automation with multimodal AI in novel ways.
00:52:27 - Upcoming Features, Competition, and Closing Thoughts
Dylan discusses Pipedream's roadmap, highlighting branching and looping as the most requested features. Currently, developers must use conditional logic in Node.js or chain separate workflows to handle iteration over large datasets within Lambda's timeout limits. Native UI support for these patterns is in active development and will significantly simplify complex automation flows.
The conversation closes with Dylan positioning Pipedream against competitors, noting that while many workflow tools target no-code users and some developer tools offer serverless execution, few combine OAuth management, thousands of pre-built integrations, and a code-first experience the way Pipedream does after six years of refinement. He directs listeners to pipedream.com and the community Slack, and the hosts encourage developers to try the platform for their next integration task instead of spinning up yet another standalone Node script.
Transcript
00:00:02 - Scott Steinlage
Hey everybody, what's up? Welcome to JavaScript Jam. Today we have an awesome guest here, but first, I'm Scott Steinlage. If it's your first time listening to this, I am one of the co-hosts of this podcast. And right over here I have Anthony Campolo.
00:00:21 - Anthony Campolo
And I'm a developer advocate at Edgio and also a co-host.
00:00:26 - Scott Steinlage
Awesome.
00:00:27 - Dylan Pierce
Dylan, how are you doing? Great, thanks for asking.
00:00:30 - Scott Steinlage
Yeah, man. So why don't you tell people who you are?
00:00:34 - Dylan Pierce
Yeah. So I'm Dylan Pierce, but I go by Pierce at Pipedream because there's another Dylan, and I work in developer relations over at Pipedream. I am a software engineer turned developer relations person, so I can tell you about the code at Pipedream and how to use it.
00:00:52 - Anthony Campolo
So how long were you engineering before you started doing DevRel?
00:00:56 - Dylan Pierce
I mean, I started tinkering when I was a young kid, like 10 or 11, but professionally, I made my first SaaS product when I was 20.
00:01:10 - Anthony Campolo
So how old are you then when you got into DevRel?
00:01:14 - Dylan Pierce
Just this past year. So ten years of software engineering and then one year of DevRel.
00:01:19 - Anthony Campolo
So what inspired the shift? I find this really interesting because I'm someone who does DevRel and I never did engineering beforehand. I went to a bootcamp, but I got into tech through doing DevRel roles. And I know for a fact there's a lot of things that people who do this don't know about how to actually do software engineering. It's like we do our best, give good information, and hopefully learn the tool enough to explain it. But there's a breadth of experience you get from 10 years of engineering that you can't get from your bootcamp. Like, you just can't.
00:01:51 - Dylan Pierce
That's true. But I would also say what I love about DevRel specifically is developers are really tool-based. Your choice of tools is based on documentation and how big the community is. And as a developer, you can make this choice based on where you think it's going and the health of it. I also went to meetups naturally because I love talking to other developers. And when I realized you could do both these things and get paid for it, it's like, what a great gig. The networking part of it and the opportunity to write content about really cool tools and help make it easier to get integrated, which is half the battle. I call it Internet plumbing. We're all Internet plumbers, in a way. We're just connecting services or even lower-level APIs together. But to do that effectively, you need solid documentation and an easy developer experience, which I'm very passionate about.
00:02:51 - Anthony Campolo
Yeah. And that really is what makes it a lot more accessible for people, because you can give people tools that let them build things with niche knowledge. We can make software less complicated. It's a thing that we can do. And with tools like Pipedream, you're connecting services that would require so much glue code. That's where less experienced developers can struggle because you only really get better at doing that through iteration and doing it a bunch of times. Then you learn the different idiosyncrasies. Now ChatGPT changes a lot of this because you can start with generated code, but even still you need the knowledge to know how to ask the right questions so you can get the correct code.
00:03:32 - Dylan Pierce
Exactly. Yes, I totally agree. Knowing how to prompt is half the battle.
00:03:41 - Anthony Campolo
Yeah, yeah. And so you're going to show something to do with AI today. This is such a cliche at this point. We bring someone on and they're like, "So what's your current AI thing that you're doing?" We just had Wasp on to show their app generator. Before that, we had OpenSauced on to show us their AI analyzer thing. So you're the third person to come on and be like, "Here's our AI thing." And I think it's great. I think everyone should find 10 ways to try and put AI in their apps.
00:04:10 - Scott Steinlage
Yes.
00:04:10 - Anthony Campolo
And then if nine of them fail, that's fine. You just found one that's going to change the game totally.
00:04:15 - Dylan Pierce
Yeah. And just to make it clear, this is not a forced path in the app. You could totally use it without AI. This is just a kind of quick start to an already really fast process.
00:04:29 - Anthony Campolo
To build based on understanding a site to scrape it, I think, is what you're kind of describing. And that to me sounds really useful because this is something that ChatGPT lacked for a while. They gave browser support for a certain amount of time and it worked really poorly, and then they took the feature away for a while. So this is something they've really struggled with. So I'll be curious to see how well this works, like looking at a webpage and understanding it.
00:04:55 - Dylan Pierce
Oh, okay. So that is a bit deeper than what it can do, but I have found a way around it.
00:05:03 - Anthony Campolo
You said you can fill out forms, though.
00:05:04 - Dylan Pierce
They can fill out forms.
00:05:06 - Anthony Campolo
So that means, to a certain extent, it understands that there are inputs that take data on this website. It knows something about that website.
00:05:14 - Dylan Pierce
That's true. Yeah. Yeah. So you can describe the HTML content in the prompt, and it does a really great job of filling that out.
00:05:24 - Anthony Campolo
Bump up your font a little bit.
00:05:26 - Dylan Pierce
Oh, sure. So you can see my screen. Is that good, or should I show a little bit more?
00:05:31 - Anthony Campolo
I think one more.
00:05:33 - Dylan Pierce
Sure.
00:05:34 - Scott Steinlage
Do it.
00:05:34 - Dylan Pierce
Yeah.
00:05:34 - Anthony Campolo
One more.
00:05:35 - Dylan Pierce
Yeah.
00:05:35 - Anthony Campolo
So it's good to just be a little bigger.
00:05:38 - Scott Steinlage
Perfect.
00:05:40 - Dylan Pierce
Good.
00:05:40 - Scott Steinlage
This looks great. Yeah.
00:05:41 - Dylan Pierce
Cool. So, just to take a step back for anyone that's never even seen Pipedream before, in Pipedream, you create a project first. A project you can think of as equivalent to a repo, and that's because you can connect it to a repo. You could serialize all your workflows within one GitHub repo. It almost makes a monorepo of sorts. So just for speed, I won't turn this on, but just know that you can connect your GitHub account and you'll make commits and deploy your version-control rollback and stuff.
00:06:15 - Anthony Campolo
Yeah.
00:06:17 - Dylan Pierce
So we'll call this one JavaScript Jamstack. Jamstack. Cool. Now we have the ability to make folders and workflows. And in this case, what is a workflow? That's a great question. A workflow is basically a script. If you're coming from a developer background, you can think of a workflow as basically a script. It's a series of steps. If you're coming from a no-code background, it's kind of like a Zapier automation or just an automation of sorts.
00:06:53 - Anthony Campolo
One of the reasons we got in touch in the first place is because we had referenced you in relation to Val Town, which we've been doing an episode on. And I was thinking, this reminds me of Pipedream a lot. And so if people check out that episode, you'll also see kind of a different take on this.
00:07:11 - Dylan Pierce
Exactly. And what's funny about that is an automation, like a workflow of ours pinged us when you mentioned us and that's how we joined.
00:07:19 - Anthony Campolo
Exactly. You use Pipedream to get notified about someone on the Internet. It's like the Bat-Signal.
00:07:26 - Dylan Pierce
Exactly. So let's start with the simplest use case, like grabbing a screenshot of a website. Let's say you make a PR for your Redwood site, and you want to automate this workflow to grab a screenshot of your change and then post it as a GitHub comment, just as an example. So we'll just call this screenshotter. Here you have more detailed controls for the underlying serverless container that runs this workflow. So with this tool you're not worrying about DevOps things at all. You're just setting container memory and timeout. You can even control concurrency and execution, which is pretty cool. And that way you can respect API rate limits without...
00:08:15 - Anthony Campolo
You make it super explicit about setting things like the timeout and the memory, because this is stuff that is so hard to get to when you use a lot of these services that wrap things on top of Lambda.
00:08:26 - Dylan Pierce
Yes. Yeah. So I am firmly in the camp too that Lambda is very unapproachable for the average developer, but I would argue this abstracts that away and makes it approachable and makes you more productive because you're not worrying about, like, wait, what's that param again I need to set up for my bash script to deploy this thing? You don't have to worry about that. It's all in the UI. You can even eliminate cold starts too.
00:08:58 - Anthony Campolo
How does it eliminate cold starts? So you have provisioned concurrency. Is that what they call it?
00:09:03 - Dylan Pierce
Exactly. So you can set up a number of workers, aka containers, that will stay warm. For those who don't know about AWS Lambdas or any serverless platform, you save on cost because it's on-demand, but as soon as the function isn't called within like five minutes, it'll go into a resting state because it's expensive for AWS to keep that code living anywhere in its us-east-1 region. So they have to basically take your code and put it back into cold storage and then warm it back up again.
00:09:39 - Anthony Campolo
Yeah, the funny thing about cold starts is it's the type of thing that if you have an app where people can wait a couple seconds, it's not really a big deal. But if you have an app that starts up on a Lambda, you're screwed because you've just caused your entire app to always load slowly every single time. And this is a huge problem, actually, for Redwood that required us to kind of move away from making it Lambda-only.
00:10:09 - Dylan Pierce
Oh yeah, yeah, yeah. So I had that same problem with Next.js and my solution was actually to create a workflow in Pipedream that just takes an array of URLs and pings them with a query param called warming. So I keep my critical routes warm, such as install, change a setting, and subscribe. Like, heaven forbid your customers can't subscribe because they're waiting five minutes for the worker to start up.
00:10:36 - Anthony Campolo
Yeah.
00:10:37 - Dylan Pierce
And it's not instant.
00:10:38 - Anthony Campolo
That was the original solution when it was first discovered and now today it is still the solution and will continue to be the solution for as long as the technology exists. Because it works the way it works and it's not going to change. That's kind of my opinion on it.
00:10:50 - Dylan Pierce
Yeah, yeah. So there are definitely limitations to serverless. It's not the end-all, be-all.
00:10:57 - Anthony Campolo
But you let the dev click a button that fixes the limitation, which is nice.
00:11:02 - Dylan Pierce
Exactly, exactly. So this is critical for folks that are using like Discord Bots or Slack because they have a timeout window. You have to respond within, I think, like 10 seconds or 5 seconds. And this will allow you to keep your workflows warm too.
00:11:17 - Anthony Campolo
And why would you want to run it in a VPC?
00:11:20 - Dylan Pierce
So this is fancy for saying outbound HTTP requests. So say you're connecting to a private database that only allows specific IP addresses to connect for security concerns. Or maybe you're running an in-house API that needs to be firewalled. So all outbound HTTP requests from this workflow will use a specific IP address. That's what it does.
00:11:52 - Anthony Campolo
Sweet.
00:11:54 - Dylan Pierce
So I won't use these fancy features because we're not enterprisey, and we don't care about that for this demo. It's not that expensive. I'm shocked at how inexpensive this solution is, actually.
00:12:06 - Anthony Campolo
So actually, is there a free plan on Pipedream or do you start with a certain monthly limit?
00:12:12 - Dylan Pierce
No, it's free. The only limitations are the number of workflows that you can have active and the number of connected accounts, such as Slack and Google Sheets, out of the thousand-plus services that we integrate with. That's the limitation. And you won't get some of these more fancy features that are more enterprise-y, like the VPC one. Like, if you're worried about VPCs, then you probably have someone to pay for that. Yeah, like you have a budget essentially.
00:12:43 - Anthony Campolo
Exactly. Cool. Yeah, I like that, because the type of dev who wants to spin something up, like getting one service to automate one task, can be really useful and valuable if it's the right task.
00:12:56 - Dylan Pierce
Exactly. I think there's two types of work at any business. There's the product work that makes your business unique and actually adds value to the customer. And there's this side integration work that marketing or customer service needs. They're like, I need to pipe my Zendesk tickets to Slack and know when things are finished, or marketing needs customers added to a CRM. That's still important, but it shouldn't take up developers' time. And if it does, it should be really quick and easy to change and flexible, which is where Pipedream really shines.
00:13:39 - Anthony Campolo
Do you have an integration that you feel makes you really high leverage, that you enjoy?
00:13:46 - Dylan Pierce
I use Pipedream quite a bit. I love using it with Slack. I've created an app called Dictionary, which is like Pictionary but uses DALL-E. So you get asked a prompt like, "Please describe key-value data store," and you try to create the image in DALL-E. It's a game. It'll share it within a Slack channel, and whoever guesses it is the next person to go. I built that all in Pipedream, and it really shines because you don't have to worry about the UI layer. For Discord and Slack, they give you the tools to create a UI, so you just worry about the back end. That's where Pipedream is really, really nice. You don't have to worry about the backend or frontend at all. Someone else is handling it for you. I'm just going to try this again. I'm not sure why.
00:14:40 - Anthony Campolo
Mine timed out, we were talking so long.
00:14:43 - Dylan Pierce
Probably. It probably is a Lambda that died. Okay, all right. So we finally made it to the workflow start screen. This is where you define the trigger. So we have many different types of triggers. The most popular one is creating a webhook or an HTTP endpoint. So this will be a unique URL, automatically generated, that you can use to trigger this workflow. AKA, it's just an API endpoint. You can even generate a test event with a Postman-like thing to test your workflow from this initial setup.
00:15:21 - Scott Steinlage
That's pretty cool. Yeah, that's what I use the webhook for, to get information from input on a form on ClickFunnels over to another process. But anyway.
00:15:34 - Dylan Pierce
Oh, cool. Cool. Yeah, that's the bread and butter.
00:15:38 - Anthony Campolo
But real quick, when I was first learning to code, people used to always talk about webhooks, and it confused the crap out of me because nobody ever actually sat down and showed just
00:15:48 - Dylan Pierce
like what it was.
00:15:49 - Anthony Campolo
what it was. People would talk about webhooks, like all the things they would do with them, and it was always very abstract for a long time. Like, what is a webhook? Then I would see examples like this and think, okay, now I get it. But when I just heard people talking about it, it didn't make any sense to me.
00:16:05 - Dylan Pierce
Right. It's just like, why don't they just call it a reverse API call? We all understand what that means. It's an API call coming to you rather than you going out and asking some service. But somebody called it a webhook, and that just kind of stuck for
00:16:20 - Scott Steinlage
some reason, because you hook it and you pull it in.
00:16:24 - Dylan Pierce
Yeah, I guess.
00:16:25 - Scott Steinlage
I don't know.
00:16:27 - Dylan Pierce
It's kind of like a web push.
00:16:28 - Scott Steinlage
Yeah, push.
00:16:29 - Dylan Pierce
Yeah, yeah, push. Yeah. So this is one type of trigger. I mean, we have many, many, many. You can set a custom interval, so this is more like a cron. And you can even set a cron expression.
00:16:41 - Anthony Campolo
Do you mean just like a Node script, right? Like JavaScript. Kind of like an HTTP webhook, but just with a hunk of Node code.
00:16:51 - Dylan Pierce
Yes. So that's called a source. I don't know if this is the
00:16:54 - Anthony Campolo
first thing I tried to do when I used this.
00:16:56 - Dylan Pierce
Oh, okay. We don't have a built-in code editor in here. That is something we're talking about, actually, though you can publish custom sources or custom triggers. We have a bunch of documentation on how to do that. It's just called a source.
00:17:14 - Anthony Campolo
Yeah, we don't have to go into that then, because I know your other example is going to be pretty involved. So what are the services you're going to use to create this AI thing?
00:17:25 - Dylan Pierce
So we're going to take a screenshot of a site. We're just using my personal site. We just released support for Puppeteer and Playwright out of the box. Pipedream supports Node.js code, Python code, and what's neat about it is...
00:17:46 - Scott Steinlage
Right, exactly.
00:17:48 - Dylan Pierce
Oh, I thought you meant the trigger. My bad. Yeah, in a step. Yeah.
00:17:51 - Anthony Campolo
Just like this. This is like a Node editor right here, and you're using Axios to make the fetches.
00:17:57 - Dylan Pierce
Yeah, you can import any NPM package. There's no package.json to worry about. You just literally test the code, and it will execute this code in a test Lambda and voila, it works. But the problem we had for the longest time was getting certain NPM libraries, namely Puppeteer and Playwright, to work. Because Puppeteer and Playwright rely on Chrome, and Chrome needs to be built for Lambda, and it's also huge. But AWS recently increased the Lambda size from like 2 gigs to 8 gigs, which solved the size problem for us. The problem was getting the right versions of Puppeteer and Playwright to work.
00:18:39 - Anthony Campolo
Can you actually explain what Puppeteer and Playwright are for people who've never used them?
00:18:43 - Dylan Pierce
Yeah, good call. So Puppeteer and Playwright are browser automation tools. They are a thin wrapper, like a Node.js wrapper, around Chromium, which is the open-source version of Chrome, and they allow you to programmatically click and type. Any action you could take in a web browser, you can do using Puppeteer and Playwright. What's different about Playwright versus just using a regular HTTP GET action and getting the HTML is that it renders the whole site in a Chrome instance. So think JavaScript-heavy apps like SPAs that wouldn't work with a regular HTML request because it's returning a JavaScript blob. You need to render it in order to see the actual hydrated content. So Chrome, Puppeteer, and Playwright render JavaScript, and you can even invoke JavaScript inside of them. They are really useful browser automation tools and can perform things like taking screenshots, generating PDFs, clicking buttons, navigating the page, and filling out forms. It's really useful for browser testing too. It's a long-winded explanation, but hopefully that helps.
00:20:06 - Anthony Campolo
No, no, I think that's great, because this is another tool that some people use all the time, and some people don't really have a reason to get into. But I think something like getting a PDF, if you look at something like Claude, you can give it a PDF that's very, very large and it can summarize things. So you can create pipelines now where you do this whole thing where you create stuff, and then you suck it in, then you put it into this other thing. These tools are becoming more and more useful, more and more high leverage, as we have more and more ways to integrate these different data sources by just piping things to models.
00:20:43 - Dylan Pierce
Exactly. You can use get PDF to generate the PDF. There are libraries we could use to generate PDFs from code, but they're notoriously difficult to work with. So if you know HTML and CSS, instead of learning yet another library that generates PDFs, you can just put up a webpage and use this get PDF action to generate a PDF instead. So you just leverage technologies you already know. Yeah, so that's kind of the gist of Puppeteer and Playwright and what they can be used for. And how this works under the hood is we published a special package that has the built-in Puppeteer version with Chrome bundled. So you import this special NPM library and then create a brand-new browser. And from a browser, you create a new page. It's familiar, just like opening Chrome on your local computer. Open a new page, go to the URL, and then you can perform actions on the page, get the content, get the title, and it will do that for you. So that's kind of the basics of how it works.
00:21:57 - Dylan Pierce
But what's really, really neat that I want to show you guys is, yeah, you can see it grab the title, grab the content.
00:22:05 - Anthony Campolo
Your camera just clicked off.
00:22:08 - Dylan Pierce
You know what, this camera, I can't adjust the sleep settings, and it'll always die at 25 minutes. I'm in the market for a new camera.
00:22:19 - Anthony Campolo
Oh my God.
00:22:19 - Scott Steinlage
Interesting.
00:22:20 - Dylan Pierce
Yeah. Do not get the Mark I Canon M50. That's...
00:22:25 - Scott Steinlage
Oh, got it.
00:22:26 - Dylan Pierce
Yeah. Mark 2, they fixed it.
00:22:28 - Scott Steinlage
Yep.
00:22:30 - Dylan Pierce
But to continue on with Pipedream, you can build pre-built actions that leverage Node.js code and kind of make a nice little wrapper on top of it. Or you can use Node.js like I just showed you. So for the most simple flow, we can use my site and just take a URL. There's a bunch of different options where you can adjust the size, change the path, etc. Then it will spin up a browser, navigate to that page, and perform a screenshot. So we're going to wait a second for the screenshot to finish. I mean, it's running Chrome. Here we have it: it's a Base64-encoded screenshot, and then it's as simple as uploading it somewhere else to view it. Imgur is kind of my favorite because it's just so easy to use. Just upload an image action. I chose my Imgur account and pasted the path from that past step into the image action, and it will upload to Imgur. So we just connected a scraper plus Imgur for uploading, and we really didn't write any code. So go over to Imgur and check out my pictures.
00:23:56 - Dylan Pierce
Hopefully we see it. Yeah, there's my homepage. So you can imagine that, instead of using Imgur, you could use GitHub and create a comment on a PR. You can use S3, upload it to S3, and pipe it anywhere. You can even pipe it to an image-manipulation API, like Cloudinary, and manipulate the image. It's pretty mind-blowing what you can do by piping together these steps. So that's kind of the basic use of this library. For the AI stuff, we did screenshotting, but you can also use AI with Puppeteer. And this is kind of a sample of the AI code gen that's available in any app that we support. So instead of having to look up Puppeteer's documentation and figure out how to grab the H1 from a website, let's just get the H1. We can ask the GPT-4-trained AI this question.
00:25:08 - Anthony Campolo
So this is accessing OpenAI's API directly.
00:25:12 - Dylan Pierce
Yes. And we have a little bit of prompt magic to guide it to know that it's talking about Puppeteer. It knows that the context is Puppeteer, so it will include that in the answer.
00:25:25 - Scott Steinlage
Created an agent?
00:25:28 - Anthony Campolo
Yes.
00:25:29 - Dylan Pierce
Yes.
00:25:30 - Anthony Campolo
Yeah, it's not really embedding. It's more like prompt engineering. Because even ChatGPT comes with its own prompts hidden underneath. They don't like you knowing how to see it, but that's just kind of normal for how we set up these bots. Because an agent would be if you had it do a series of tasks, right? When people use the term agent, they're talking about a thing that can actually make decisions. You tell it to do a thing and it does multiple steps. It'll go from one to the other. That's what an agent is in the current modern usage of the term.
00:26:09 - Dylan Pierce
Yeah, you can kind of think of it as this particular prompt being hyper-focused on Puppeteer and its documentation. It's more like fine-tuning specifically to this app. So when I ask it, like, please grab the H1 from the webpage, I don't need to tell it that we're talking about Pipedream code here. We're talking about Puppeteer, please use Puppeteer methods. It's aware that it's in Pipedream and it's using Puppeteer. So it'll generate code that uses that NPM package, launch the browser, go to my webpage, and grab the H1. So it's pretty wild.
00:27:02 - Scott Steinlage
But in your prompt, I'm sure, on the back end, you're telling it to use that certain package because it wouldn't know that otherwise. It's back to 2021, right?
00:27:14 - Anthony Campolo
So when did Pipedream add Puppeteer?
00:27:18 - Scott Steinlage
It was just recent, though.
00:27:20 - Dylan Pierce
Correct. We had to embed all of Pipe
00:27:23 - Scott Steinlage
Dream documentation inside of that.
00:27:24 - Dylan Pierce
Yeah, inside of this. Yeah, exactly.
00:27:27 - Scott Steinlage
Did you guys use a vector DB for that, or is it just solely prompts?
00:27:32 - Dylan Pierce
No, there is a vector storage for all. Like we do crawl all Pipedream documentation.
00:27:39 - Anthony Campolo
That's the only way to ensure it's actually going to know what the heck to do.
00:27:41 - Scott Steinlage
That's what I was wondering.
00:27:42 - Dylan Pierce
Yeah.
00:27:42 - Anthony Campolo
Yeah, that's really fascinating. How much data does that take up?
00:27:47 - Dylan Pierce
You know, that's a really good question. I don't know. I would have to take a look. I wonder if I'm just typing this wrong. So for some reason during the demo, it did not pull in the correct NPM package. It tried to grab our custom Axios package, which is incorrect, but I believe the Puppeteer code itself is correct. And for those who want to interact with the Pipedream AI without making an account, you can go to our Slack and talk to our AI agent there. It's also trained. So, cool, it grabbed the H1 and returned it. I use this feature extensively to create a Reddit scraper because the Reddit API has been locked off. You have to pay a ridiculous amount of money to use it. So now I have a scraper, and I didn't have to go in and remember all the query selectors. I gave it a snippet of HTML in this prompt and said, "Please grab the Reddit comments from this HTML," and gave it an example, and it was smart enough to pull out the individual upvotes, name, comment, etc.
00:29:00 - Anthony Campolo
Yeah, this is super cool, actually. I remember when I was first learning to code and I was looking at videos and thinking, you know, what could you do? And web scraping as a skill was something that came up a lot, and a lot of people were using Python scripts and stuff like that. So it was very unapproachable for me at the time. But this is an extremely approachable way to scrape a website and actually pull a specific piece of data off it. That's the hard part. If you're just making a call, getting a huge chunk of HTML, and all this manual stuff, that's hard to even know what to do with. Whereas this is just giving you, "I just want this part of the website," so you can target the exact data you need really easily with very little code.
00:29:43 - Dylan Pierce
Exactly. Yeah. There's a whole generation of web-scraping apps that are built on AI for that purpose, for someone to easily create a scraper from a text prompt. And here you get it for free, which is kind of cool. But again, you have to be a developer to know how to use this stuff.
00:30:02 - Scott Steinlage
I would say, however, learning to scrape is beneficial because you really have to dig around on the webpage inside of the code to determine what you're trying to pull, what you're trying to scrape. So it really helps you learn more about that.
00:30:34 - Dylan Pierce
Yeah.
00:30:34 - Scott Steinlage
At least personally. So. Yeah.
00:30:36 - Dylan Pierce
Yeah.
00:30:36 - Scott Steinlage
But this is super cool. It makes it easy, though, if you don't want to go through that process. Jump in here and go have a go.
00:30:42 - Dylan Pierce
It's awesome, right? It works great, especially if you know the element is going to exist. Any good webpage is going to have an H1. Another good example for SEO or marketing devs is pulling all the meta information.
00:31:01 - Anthony Campolo
So yeah, like description and OG tags.
00:31:05 - Dylan Pierce
Right, like, pull all meta tags from Pipedream. Hopefully it's smart enough to realize I'm talking about an array. Yeah, so it's grabbing all the tags, getting attributes, returning them as an object, and using the right NPM package. Cool, use the code. So for a known structure ahead of time, of any generic webpage, fantastic. There is a little bit of elbow work to get it to play nicely with custom elements and such.
00:31:45 - Scott Steinlage
Yeah, I guess you would still need to dig into that.
00:31:48 - Dylan Pierce
Yeah, yeah, exactly. But I find it a lot easier to do that and just copy HTML and paste it into the AI prompts than it is to look up the documentation. Yeah, like, what's Puppeteer? How do you spell Puppeteer?
00:32:07 - Anthony Campolo
Yeah, I'm constantly just grabbing huge hunks of code and throwing them into ChatGPT and asking questions like, how do I do this?
00:32:14 - Dylan Pierce
Yeah, exactly. And here you are. I'm still old school. I still like Google more than I use AI. But yeah, you're going through Stack Overflow, and which of these answers are updated?
00:32:23 - Scott Steinlage
Like, yeah.
00:32:24 - Dylan Pierce
Wasting so much time. But this just worked, I mean, with a static known structure of a page.
00:32:34 - Anthony Campolo
Great.
00:32:36 - Dylan Pierce
Awesome. When it comes to more involved, like, we can try Reddit. Let's see if we can get Reddit to work.
00:32:45 - Anthony Campolo
Twitter, actually.
00:32:46 - Dylan Pierce
Twitter, I thought with Twitter you had to be...
00:32:51 - Anthony Campolo
They require an account now. I think that was the case. I'm not sure if that's still the case. Let me see. I can see myself in incognito.
00:33:00 - Dylan Pierce
Yeah. See if you can share a link and we can try to scrape it. That'd be fun to do.
00:33:05 - Anthony Campolo
Yeah, you can just do my Twitter. It's just AJCWebDev.
00:33:09 - Dylan Pierce
AJCWebDev. I have this weird audio dyslexia. I have to talk it out loud. Oh, it is public. Look at that. All right, cool. So we're going to take a look at the elements here. We're going to try to...
00:33:27 - Scott Steinlage
Do you want to share that screen, or is it up to you?
00:33:30 - Dylan Pierce
Oh, yeah, I got you.
00:33:32 - Anthony Campolo
Yes. We're only seeing this one screen.
00:33:34 - Dylan Pierce
I did it in incognito, so I'll just go through here because I know it's the same. So the way I look at it here, I can see they're doing some, like...
00:33:43 - Anthony Campolo
We're still not seeing your screen right now.
00:33:45 - Scott Steinlage
Yeah, no, we're just seeing the page. I think you're only sharing this page.
00:33:51 - Dylan Pierce
Let's see if I can, instead of the browser, I guess. Share screen, share screen... oh, window.
00:33:59 - Anthony Campolo
When you're presenting, one will only show a single tab versus one that shows your whole desktop.
00:34:03 - Scott Steinlage
Right, gotcha.
00:34:05 - Dylan Pierce
Is that better?
00:34:07 - Scott Steinlage
Yeah, there we go.
00:34:10 - Dylan Pierce
So we're going to try.
00:34:12 - Anthony Campolo
Go ahead and increase your font a couple times again though.
00:34:17 - Scott Steinlage
Cool.
00:34:17 - Anthony Campolo
Yeah.
00:34:18 - Dylan Pierce
All right, so we have the, the Twitter HTML open and Chrome dev tools. We're looking for some kind of pattern that we can guide AI to.
00:34:31 - Scott Steinlage
Right.
00:34:31 - Anthony Campolo
Just try and grab a tweet.
00:34:33 - Dylan Pierce
Yeah, so there's something nice here: data-testid equals tweet. That's cool. I would probably use that because there's so much auto-generated CSS. These class names are definitely unique and generated at build time. So we know we can use this data-testid, and we know that within data-testid there's content, eventually a span. The span has nothing else interesting.
00:35:04 - Anthony Campolo
It's a span inside a div, inside a div. This is what people talk about with div soup and how it makes it almost impossible to find any structure within the HTML.
00:35:14 - Dylan Pierce
But here's something cool: data-testid equals tweetText. We could probably just grab this instead.
00:35:22 - Scott Steinlage
Right?
00:35:22 - Dylan Pierce
I'll go back here. This is totally untested. I've never tried to scrape Twitter before.
00:35:32 - Scott Steinlage
That's what makes this so cool.
00:35:34 - Dylan Pierce
Please visit your Twitter and extract the inner HTML, or not inner HTML. We care about the span contents within the div with the attributes. And check my Raycast here, see if that's smart enough to realize that we have an array of these test attributes.
00:36:17 - Scott Steinlage
If it'll continue.
00:36:19 - Dylan Pierce
Yeah. So we're hoping... yeah, it is finding the div, it's finding the span within, and then it's going over each of them and trying to retrieve the text content of the spans.
00:36:29 - Scott Steinlage
It looped it.
00:36:31 - Dylan Pierce
Yeah. This will be crazy if it just works out of the box.
00:36:40 - Anthony Campolo
There's so much analysis you can do on tweets, and yeah, I think they locked down their API recently.
00:36:49 - Scott Steinlage
I was going to say more expensive. Oh yeah. But Twitter's going to like be like, oh, someone's scraping us now.
00:36:56 - Dylan Pierce
Well, the nice thing about Pipedream, like I mentioned before, is it spins up a Lambda anywhere in us-east-1 by default. So the traditional tools are kind of hard to... you can't detect Pipedream's IP address.
00:37:11 - Scott Steinlage
Right.
00:37:11 - Dylan Pierce
It didn't quite work this time. It's too bad.
00:37:14 - Scott Steinlage
Yeah, I bet if we tweaked it, we could figure something out.
00:37:16 - Dylan Pierce
But yeah, test ID. Let's just try to see if it can even find that tweet text. I wonder if it's screwing up on innerText. I'm not sure if that's valid JavaScript.
00:37:35 - Scott Steinlage
Sometimes it'll hallucinate that stuff.
00:37:37 - Dylan Pierce
Yeah, yeah.
00:37:37 - Anthony Campolo
Well, yeah. Is it running in Node? Because if it tries to run a browser API, it's like a DOM thing it's trying to run.
00:37:47 - Scott Steinlage
What's the temperature you guys have on this? Can you recall off the top of your head?
00:37:51 - Dylan Pierce
Oh, sorry, I do not know that. But we can return this and we'll just log out what the element is. Can it even find an element? Let's see. I have a feeling innerHTML is going to do better. I think innerHTML is a valid attribute, but we'll see. But this whole thing kind of highlights how easy it is to make tweaks and then retest something. I'm not bouncing between my terminal, the web, and VS Code. I'm just clicking test and rerunning things, and I can see the logs. I have a feeling there's a bug in the codegen that's running the old code on the first run, and you're seeing it live. But I'll test again and find out. That's kind of the nice thing about using this system: it's not finding any at all.
00:38:58 - Anthony Campolo
It's just incredible how much you can do without needing to even be in a code editor.
00:39:03 - Dylan Pierce
Yeah.
00:39:03 - Scott Steinlage
Yeah.
00:39:06 - Dylan Pierce
Did I pick the wrong thing? I wonder if we have to wait for this thing to exist yet. If this is an SPA, then we have to wait.
00:39:14 - Scott Steinlage
Oh, yeah, till it loads.
00:39:17 - Dylan Pierce
Yeah, and wait for an instance of it to appear.
00:39:28 - Scott Steinlage
Let's see what it puts in the code. I'm curious. This is cool.
00:39:32 - Dylan Pierce
I don't remember how to wait for stuff. So.
00:39:36 - Scott Steinlage
Yeah.
00:39:36 - Anthony Campolo
You're like, don't do it.
00:39:37 - Scott Steinlage
And it's gonna.
00:39:38 - Dylan Pierce
Hopefully do it.
00:39:39 - Scott Steinlage
Yeah, await page... what's it say? I can't really read it.
00:39:48 - Dylan Pierce
That can't be it.
00:39:50 - Anthony Campolo
So it might actually be that... I'm trying to curl it and you can't do that. So it might be that you have to actually open it in a real browser, and they're able to tell that you're not doing that, maybe.
00:40:04 - Scott Steinlage
Yeah.
00:40:04 - Dylan Pierce
Maybe we hit some kind of security thing. So let's just return the content. Like the whole page content.
00:40:12 - Anthony Campolo
If you're getting back anything.
00:40:14 - Dylan Pierce
Yeah. And then we'll just export that real
00:40:16 - Scott Steinlage
quick and then do it from there.
00:40:20 - Dylan Pierce
Yeah, and see what it looks like. I have a feeling that Twitter is smarter. Elon Musk and his dev team are probably on the lookout for this kind of stuff.
00:40:35 - Scott Steinlage
I'm sure
00:40:37 - Dylan Pierce
if you're being charged $42,000 a month for the entry level for API access.
00:40:42 - Scott Steinlage
Right. You better be blocking all the different ways.
00:40:44 - Dylan Pierce
Yeah. So it's zero. They might even be blocking all of AWS us-east-1 IP addresses. You're not getting any content whatsoever. So I'm sorry, viewers. You just watched us hit something, but it was fun.
00:41:00 - Scott Steinlage
Yeah, it was cool.
00:41:02 - Anthony Campolo
It was interesting because it shows how every platform is different and that's why having all these integrations gives you so much leverage.
00:41:09 - Scott Steinlage
Yeah.
00:41:09 - Dylan Pierce
You can use this with Reddit.
00:41:11 - Scott Steinlage
Yeah, let's do it.
00:41:12 - Dylan Pierce
Yeah, r/javascript, let's say. So we care about...
00:41:19 - Scott Steinlage
The problem is they're not charging enough, I guess.
00:41:23 - Dylan Pierce
Well, also, I noticed how easy it is. Have you ever looked at the Reddit HTML source before? This is pretty wild. It's like it's built to be scraped.
00:41:34 - Scott Steinlage
Probably because they did it themselves for some reason.
00:41:38 - Dylan Pierce
Yeah, they probably like, for example, some
00:41:41 - Scott Steinlage
sort of analytics or something.
00:41:45 - Dylan Pierce
It gives you the ID of the actual comment. It tells you how many replies there are, data.replies.
00:41:52 - Scott Steinlage
See, they're using like Mixpanel, and they're like, let's just make this really easy.
00:41:55 - Dylan Pierce
Yeah, it gives you the data subreddit, that kind of stuff. So we'll tell it to go here. Let's make a new one. We're going to scrape Reddit instead because they're a lot more friendly to folks like us. So, Puppeteer, use AI. Please visit this page and grab all... what do we care about? Maybe the comments, like the actual comments themselves.
00:42:28 - Anthony Campolo
Yeah, you can like grab those and run sentiment analysis and be like, do people like my thing?
00:42:33 - Dylan Pierce
Exactly, yeah. Do people like my thing? So within the comment there's data-type equals comment. Cool, that's pretty descriptive. And within those comments, there's a paragraph. So there's a p tag within the parent. I think that's a good way to do it. Or a div class md with a p tag within that. Or we could just say that's the markdown. Oh, you think so?
00:43:22 - Scott Steinlage
Maybe.
00:43:28 - Dylan Pierce
So we'll tell it to... I misspelled divs there. And within these comments, extract the value of the p tag within the div. I mean, that's pretty terrible English, but I think the AI is smart enough to figure that out. Go to the page, extract all the comments, and within the comments, select the... yeah, that might work.
00:44:07 - Anthony Campolo
And you should make sure to point out that you need to use old.reddit. This is because there are multiple versions of Reddit.
00:44:15 - Dylan Pierce
Oh yeah. So for those who haven't been using Reddit for 20 years, old.reddit was the original Reddit. reddit.com is the new version, which tries
00:44:25 - Anthony Campolo
to redirect you to the app and has like age restrictions. There's a lot of big differences.
00:44:30 - Dylan Pierce
Yeah. And I think from my deep dive before, old.reddit.com just has an easier HTML structure to scrape. The new one is using the same kind of Twitter dynamic class names we saw, if you guys choose to include that clip. But this will hopefully extract the text. There's also cool stuff you can do inside the HTML, like grab upvotes. There's a straight-up upvotes span that gives you both downvotes and upvotes. So dislikes is zero, upvotes is one, or score is one. Yeah, that's pretty cool.
00:45:12 - Anthony Campolo
And then you just train more models with all your Reddit.
00:45:18 - Dylan Pierce
Yeah, exactly, exactly.
00:45:21 - Anthony Campolo
I've heard that Reddit data has been used to train things like GPT because it's such a massive source of unique human text on the widest possible breadth of subjects you could imagine.
00:45:33 - Dylan Pierce
Yeah, so it actually worked, guys.
00:45:35 - Scott Steinlage
That's awesome.
00:45:38 - Dylan Pierce
Yeah, so we didn't look up the Puppeteer documentation once, and we just sat here and played with prompts, looked at HTML, and grabbed comments. And we could play with it more and grab the author name, upvotes, reply counts. There's all kinds of data to play with. But yeah, you can scrape Reddit in this way.
00:45:59 - Scott Steinlage
Super cool.
00:46:01 - Anthony Campolo
Yeah. And I imagine Hacker News would be very similar because they're basically the same.
00:46:06 - Scott Steinlage
I wonder if you, you know, so you used a lot of language that would be known by a developer, but if you use the language that was more common just to the English language rather than developer language, would it still work?
00:46:21 - Dylan Pierce
Yeah, let's find out.
00:46:22 - Scott Steinlage
So I think ChatGPT would be smart enough to figure it out.
00:46:25 - Anthony Campolo
But that's the thing that really blows my mind about ChatGPT. I literally explain exactly what I want to do, and it goes from that to figure out what it wants. I find that sometimes I don't even need to specify anything code-related. It's just purely English. That's what I'm saying, just describing it.
00:46:45 - Scott Steinlage
Don't say, like, you know, we had
00:46:48 - Anthony Campolo
told it, like, "Grab the tweets in this." You can actually get through the website that way. But no, something like that instead of trying to tell it...
00:46:55 - Scott Steinlage
So say it here.
00:46:56 - Anthony Campolo
I need to grab this HTML thing, right?
00:46:58 - Scott Steinlage
Yeah, grab the comments on this page here, or something like that. And I don't even know if you would... let's try without the class.
00:47:12 - Anthony Campolo
Yeah, just straight to the link. Yeah, yeah, basically. Can normal people do this?
00:47:16 - Scott Steinlage
That's what I'm saying.
00:47:19 - Dylan Pierce
I think because it does not have the context of the md, where it's at... yeah, it's describing the body. It's assuming that there is a... it's not even using the right package. It's using Axios.
00:47:34 - Scott Steinlage
Yeah, weird. So to make this even better for people who wouldn't be dev people, you could add into your prompts, or have... because you have, what do you guys call those? The pre-written processes or tasks that you can tell it to do without having to write any code. You just click it and it... I can't remember what you guys call them.
00:48:06 - Dylan Pierce
Actions.
00:48:06 - Scott Steinlage
Actions, yeah. So you have these pre-written actions or whatever, right? So maybe have one that's for... that would be a lot, though, because there'd be so many. Every site's different, right? So yeah, never mind.
00:48:18 - Anthony Campolo
I would imagine if you wanted something more like how you had the custom training from Puppeteer, you'd want to kind of train it on what it would get back from a certain website and how to understand it. You could probably do that. That'd be a lot of interesting stuff to
00:48:32 - Scott Steinlage
do, but it would be. That's what I was saying. Yeah.
00:48:35 - Dylan Pierce
So I mean, the way I envision this working is making this AI aware of data between steps. So let's pretend we're getting all the content in one step, and then the next step we would say, you know, we could just use regular Node and say...
00:48:55 - Scott Steinlage
But the other thing is, who's not a developer that would know, "Oh, let's go into Node and start generating"? You know what I'm saying? Like, this is for developers, right?
00:49:04 - Dylan Pierce
Like for developers. That's how we're different from the competition.
00:49:07 - Scott Steinlage
Exactly.
00:49:07 - Dylan Pierce
Like Zapier or others, we are...
00:49:11 - Scott Steinlage
That's your avatar.
00:49:12 - Dylan Pierce
Yeah. Like we are developer first. So we're going to 10x your productivity.
00:49:17 - Scott Steinlage
Right.
00:49:17 - Dylan Pierce
We're going to give you pre-built actions too if you want to use them.
00:49:22 - Anthony Campolo
But what might actually be the missing bridge here is now that GPT-4 can just look at a picture. If you were to show it a picture of the website...
00:49:32 - Scott Steinlage
That's what I was doing.
00:49:34 - Anthony Campolo
That might be all you need.
00:49:36 - Scott Steinlage
I did that with some things which is interesting.
00:49:38 - Dylan Pierce
It was able to infer the CSS classes and stuff, or find markup? Oh, you mean from the actual image. Just take a screenshot, yeah.
00:49:48 - Anthony Campolo
If it could look at the screenshot of the image and then you also let it call the endpoint to get the code, it could then correlate those two.
00:49:54 - Scott Steinlage
Yeah. And the other thing is, if you take a PDF of it... let's say it's a super long page, because Reddit can go really long, right? And so you take the whole page...
00:50:05 - Anthony Campolo
Right.
00:50:05 - Scott Steinlage
Or at least as much as you can get. And so that could be like 10 pages.
00:50:09 - Dylan Pierce
Right.
00:50:10 - Scott Steinlage
Or something. And then you take that PDF and export it to a PNG, and then upload it to ChatGPT. It'll actually scroll through all that, and then you can ask it to do anything from it.
00:50:24 - Dylan Pierce
That kind of obsoletes web scraping, yeah.
00:50:27 - Scott Steinlage
From an image standpoint. But there's.
00:50:29 - Anthony Campolo
I.
00:50:29 - Scott Steinlage
There's definitely fallbacks, I'm sure, but I'm just not quite sure what that would be yet. I haven't used it as much as
00:50:35 - Dylan Pierce
you'd probably have to take the manual effort to give it at least one highlight and say, "This is a comment." Maybe not. It might be smart enough.
00:50:42 - Scott Steinlage
I think it could. Yeah, I think you could read it.
00:50:45 - Dylan Pierce
That's crazy. I never thought about the implications of image processing like that. I just assumed website generation, like, "Here's my crappy hand drawing, please..."
00:50:57 - Scott Steinlage
No, I used it in the way we just spoke of, actually, very similarly, at least for a project I did.
00:51:02 - Dylan Pierce
That's cool, huh? I need to get access to that. I don't have access to it yet.
00:51:07 - Anthony Campolo
Yeah, I've only had it for like two weeks, and the first week it was pretty much broken. It's a whole different way of thinking. You don't tell it, you just show it. What was that?
00:51:19 - Scott Steinlage
I was saying to Dylan, are you sure you don't have access to it? Because if you just log into ChatGPT and you have Pro.
00:51:24 - Dylan Pierce
Right.
00:51:24 - Scott Steinlage
You pay 20 bucks a month or whatever. There's a little image icon at the bottom.
00:51:28 - Anthony Campolo
That's the question.
00:51:29 - Scott Steinlage
There's a little image on the bottom of the personal account.
00:51:31 - Dylan Pierce
No, no.
00:51:32 - Scott Steinlage
Okay.
00:51:32 - Dylan Pierce
I end up using... the nice thing about Pipedream is I end up using our paid account because it's so easy to just select the account that should be used.
00:51:42 - Anthony Campolo
Be able to do images then.
00:51:45 - Dylan Pierce
Yeah. I just use the company account this way.
00:51:47 - Scott Steinlage
And it should be able to use Pipedream to select an image and then put one in there. Yeah, I don't know exactly.
00:51:55 - Dylan Pierce
So we could automate that flow you just talked about. Use Puppeteer to make a screenshot.
00:52:00 - Scott Steinlage
Yes, Exactly.
00:52:01 - Dylan Pierce
Use an OpenAI action, or upload the image, and then prompt it.
00:52:07 - Scott Steinlage
Ask it to do whatever.
00:52:08 - Dylan Pierce
Yep. And then expose it as an API and charge 20 bucks a month.
00:52:13 - Scott Steinlage
Oh my gosh.
00:52:13 - Dylan Pierce
Right?
00:52:15 - Scott Steinlage
And then everybody does it. And then people...
00:52:19 - Dylan Pierce
Yeah.
00:52:19 - Scott Steinlage
Fail. But yeah,
00:52:23 - Dylan Pierce
that's unique, yeah.
00:52:25 - Scott Steinlage
No, open source it. Come on.
00:52:27 - Dylan Pierce
Of course, of course. Because now you can, using, like I said with projects, connect this project to GitHub.
00:52:33 - Scott Steinlage
GitHub.
00:52:34 - Dylan Pierce
And open-source it. You can just...
00:52:35 - Scott Steinlage
Boom.
00:52:36 - Dylan Pierce
Yep. And this will serialize the code into GitHub.
00:52:40 - Scott Steinlage
It is Hacktober month, you know, so...
00:52:43 - Dylan Pierce
I'm still rocking my old Hacktoberfest T-shirt from before the whole debacle with... man, that was crazy. I forget, do they still give out T-shirts, or is that over now?
00:52:54 - Scott Steinlage
I don't know.
00:52:55 - Anthony Campolo
They stopped, and they said it helped a lot in cutting down on PR spam. People weren't just trying to get people to merge PRs that weren't actually anything so you could get a T-shirt. So I think it was probably the right move.
00:53:07 - Dylan Pierce
People go to far lengths to get free swag. That's the lesson.
00:53:11 - Anthony Campolo
Yeah. So obnoxious.
00:53:13 - Dylan Pierce
Yeah. It was such a comfy T-shirt. Yeah, it was a great T-shirt. It was very comfy. I still use it.
00:53:20 - Scott Steinlage
I love it when people do swag. Right. Because then I will use that shirt for a long time.
00:53:24 - Dylan Pierce
Right, Right.
00:53:25 - Anthony Campolo
Yep.
00:53:27 - Dylan Pierce
There's the next contest. You have to use Pipedream to
00:53:31 - Scott Steinlage
create a T-shirt.
00:53:32 - Dylan Pierce
order the T-shirt, order the T-shirt design, and order the T-shirt. We'll give you an API endpoint, and you have to train it to do that.
00:53:39 - Scott Steinlage
You know what? You guys should have, like, some competitions or something for, like, doing silly things. Like, whoever can do this the most unique way or something. I don't know, you know?
00:53:49 - Dylan Pierce
Yeah, yeah, we've been talking about it. It'll definitely generate some...
00:53:52 - Scott Steinlage
Some. Yeah, or whatever. Yeah, it would generate some stuff, I'm sure.
00:53:56 - Dylan Pierce
Yeah, we've thought about doing an AI competition or hackathon.
00:54:01 - Scott Steinlage
Yeah.
00:54:01 - Dylan Pierce
And there would be a Rube Goldberg kind of competition: who can make the most ridiculous workflow that does basically nothing, or calls itself, or whatever. Yeah, but that's definitely on our radar. So that's kind of just the overview of the Puppeteer-plus-AI combination, which I think is a really interesting combination, and this is just one of many features that you can do.
00:54:31 - Scott Steinlage
That's awesome, man. Thanks so much for sharing with us. This is really cool. I love seeing what Pipedream has because you guys always have some really cool stuff that's so usable. And not just usable, but good for your daily life as a developer, or just getting into doing things. If you haven't checked out Pipedream, you all should go check it out. It's free to start with, and I'd highly suggest getting in there and messing with it. If you've ever messed with Zapier, however you want to pronounce it, and you're a developer, just go to Pipedream.
00:55:09 - Dylan Pierce
It's...
00:55:10 - Scott Steinlage
It's the same thing, but better.
00:55:11 - Dylan Pierce
For so many one-off tasks where you're just like, "I really don't want to open up yet another Node script," do it in Pipedream. Just trust me.
00:55:22 - Scott Steinlage
Yeah, totally, absolutely.
00:55:24 - Anthony Campolo
Are there any exciting features on the horizon that people should look forward to?
00:55:31 - Dylan Pierce
What can I talk about? What can I talk about?
00:55:33 - Anthony Campolo
And if you don't have anything specific, I'll be curious how you see AI continuing to expand within these kinds of tools.
00:55:41 - Scott Steinlage
We love AI, so.
00:55:43 - Dylan Pierce
Yeah, yeah. So the AI, we're constantly improving the AI. I would say we're brainstorming ways to make the AI more aware of your workflow, so you can do things like base the prompting off of data in other parts of your steps and train it better with embeddings based on the API docs of the service that you're asking about. Yeah, that's one of the things we've been researching, because right now it just uses the app name itself. So Puppeteer has had great documentation for a long time, but we need to feed it the updated documentation through embeddings.
00:56:30 - Scott Steinlage
Yeah, that's interesting. That's very interesting that you said that, because that was something when I was messing with the
00:56:44 - Dylan Pierce
AI?
00:56:44 - Scott Steinlage
piece that you guys had...
00:56:48 - Dylan Pierce
About Pipedream? The bot thing?
00:56:50 - Scott Steinlage
No, not on Slack, but inside of here. When you first launched the AI piece of this that used ChatGPT inside of the Node.js...
00:57:03 - Dylan Pierce
Okay, code generation?
00:57:04 - Scott Steinlage
Yeah, the code-generation piece. Yeah, yeah, yeah, yeah. It was not getting the latest documentation. For example, I was trying to do something with Twitter before they changed everything, and it was getting the old documentation from 2021, and they had just changed several things. So anyway, it was interesting. But yeah, having that would be huge, a game changer for sure.
00:57:31 - Dylan Pierce
Yeah, yeah, definitely. Oh, the big thing that we are working toward, this is our big rock, is to introduce branching and looping, which sounds funny. I can't believe this product's been around for six years now and doesn't have it, because I just haven't needed to use it that frequently. But an example of how this could be useful is for Stripe webhooks. You know how you can make one endpoint and subscribe to many events? If you could branch that one webhook URL into many workflows...
00:58:05 - Scott Steinlage
Oh, cool. So if this happens, do all these different things?
00:58:08 - Dylan Pierce
Yeah, almost like a router. You have one endpoint, and then you can use a branch that'll trigger
00:58:14 - Scott Steinlage
so many different things.
00:58:15 - Dylan Pierce
Right.
00:58:16 - Scott Steinlage
And that doesn't happen right now?
00:58:17 - Anthony Campolo
Really?
00:58:17 - Scott Steinlage
I didn't realize that.
00:58:18 - Dylan Pierce
Yeah. You have to use conditionals in Node.js or Python in order to act on things conditionally. So we are working toward that. I'm actually going to release some documentation next week on how to trigger a workflow programmatically. But the UI for having a branch is yet to come. And looping, because right now, if you have to loop over a large amount of records, say you pull from Supabase or Airtable or whatever, you have this large amount of records and you're trying to do an async op on each one, you're hitting the 750-second Lambda limit. The way around it is to loop within Node.js and hit another workflow. So that way, you have a processing workflow and an iteration workflow. And we're working on getting that native so you don't have to worry about it. Under the hood, it'll make new workflows and kind of hide that from you. So those are the two big things that customers have been asking for for a long time. I've run into them myself, and I have workarounds like I just described.
00:59:25 - Scott Steinlage
Right.
00:59:26 - Dylan Pierce
But I'm really excited about that.
00:59:28 - Anthony Campolo
Pinging it.
00:59:28 - Dylan Pierce
Much easier.
00:59:29 - Scott Steinlage
Yeah, that's cool. Awesome, man. Awesome. What about any other... do you guys ever go to conferences or anything like that? Is that something you guys do, or do you make a physical appearance, or no? I don't know.
00:59:43 - Dylan Pierce
Not yet.
00:59:45 - Scott Steinlage
I just didn't know.
00:59:46 - Dylan Pierce
Yeah, yeah, yeah, yeah.
00:59:47 - Anthony Campolo
You think you will?
00:59:49 - Dylan Pierce
I would like to one day. We are still an incredibly small team. I'm shocked at how big this product is, how wide it is, and how complicated it is. And there's really only like 10 of us, so it's still new. We're still in bootstrap mode.
01:00:09 - Anthony Campolo
You got a lot of users though.
01:00:11 - Dylan Pierce
Yeah, a lot of users. We have a lot of users. We have a lot of competition as well.
01:00:17 - Anthony Campolo
I'll be curious. I know this is always difficult to talk about, but who do you see as your main competitors?
01:00:25 - Dylan Pierce
It's kind of funny. The way I see it, there really aren't any right now, because if you look at existing workflow tools, they're all tailored to no-coders, and if they are tailored to coders, they don't have the integrations that we do. We have essentially a combination of Nango. Do you guys know Nango? It's like an OAuth authentication manager.
01:00:57 - Anthony Campolo
Anything about OAuth I've managed to avoid, thankfully.
01:01:00 - Dylan Pierce
Exactly. Like no one wants to do it.
01:01:02 - Anthony Campolo
Yeah, but you said you had a lot of competitors, though, so it sounds like there are a lot of people going at it in the space, even if they do it differently from you.
01:01:11 - Dylan Pierce
They do it differently, yeah. We're definitely unique in that we handle OAuth, we handle account connection, we have the code-first Git integration, and we're developer-focused. Whereas others only have one or two of those things at a time.
01:01:29 - Anthony Campolo
Yeah, the dev focus is what really stands out, I think.
01:01:34 - Dylan Pierce
Yeah, there are other developer-focused, easy serverless tools, but they don't have the ease of use of connecting accounts or the library of literally thousands of pre-built actions and triggers.
01:01:46 - Anthony Campolo
Yeah, there's, like, "You can run JavaScript code." It's like, okay, cool. I could have done that with a Node server also.
01:01:53 - Dylan Pierce
Exactly. You can spin up a Lambda easily under the hood, but we have a layer on top that I think is so much more fine-tuned because it's been around much longer. The newer ones you mentioned, like Val Town, are less than two years old. We're six years old.
01:02:10 - Anthony Campolo
So yeah, they're more like giving you these sandboxes to run code, and then they're starting to build out libraries to connect things. So I think that's also interesting. And everyone has different ideas of what they want to stitch together and uses different tools. Some people will want Sheets. They'll want to bring in a spreadsheet. And some people want to build Discord bots. And I think that's great, because this stuff is so complicated. The more companies that provide tools to make it easier, the better.
01:02:40 - Dylan Pierce
Yeah, exactly. It's kind of like the unbundling of Rails, where now you have systems that do auth for you. Clerk.
01:02:50 - Anthony Campolo
Yeah.
01:02:51 - Dylan Pierce
Background job services like QStash, Redis, Upstash. This is your integrations code. You don't have to worry about writing it yourself anymore, or at least not a lot of it. And it's hosted for you. I'm a big believer that DevOps should be hidden from you. Just be productive. You don't need to worry about DevOps. Just deliver value. Deliver solutions, not code.
01:03:20 - Anthony Campolo
Yeah, fire your platform team, you all.
01:03:23 - Scott Steinlage
All.
01:03:24 - Dylan Pierce
Yeah, there will be very highly paid platform people, kind of like how AWS sucked up all the infrastructure engineers that had to set up rack servers and stuff.
01:03:38 - Anthony Campolo
So where should people go online to find out more about Pipedream or to find you?
01:03:43 - Dylan Pierce
Yeah, definitely head over to pipedream.com. We have a really awesome community, and of course, there's an AI bot in there. It helps you generate code as well, and we're also happy to answer questions there. You can find me on X or Twitter, whatever you call it, at Control Alt Dylan. I post frequently to our blog, and yeah, thanks for having me on, guys. It was really fun, and I'm sorry that Twitter decided to block us. That would have been really cool, but I'm glad that Reddit's still pretty open. That's pretty neat.
01:04:17 - Scott Steinlage
Yeah, no, that was cool. Awesome. Well, thank you so much, Dylan. Greatly appreciate your time today. Really glad that you came on with us and accepted the invite. So yeah, remember, if you haven't checked out Pipedream yet, go check it out and hit up Dylan if you have any questions or whatever. Yeah, awesome. Thank you so much. Greatly appreciate it. Anthony, anything else?
01:04:46 - Anthony Campolo
Nope. I hope people check this out and find something useful to do with it.
01:04:50 - Scott Steinlage
Totally. All right, thanks. Appreciate you all.
01:04:52 - Dylan Pierce
All right.
01:04:52 - Scott Steinlage
We'll see you in the next one.
01:04:54 - Anthony Campolo
Next one. Peace.