Using Pomerium to Secure LLMs with Nick Taylor

A practical conversation about zero trust security, running local language models, and how to safeguard internal endpoints using Pomerium.

Episode Description

A practical conversation about zero trust security, running local language models, and how to safeguard internal endpoints using Pomerium.

Episode Summary

This conversation explores how developers can protect local applications and large language models (LLMs) with Pomerium’s zero trust approach, emphasizing the importance of continuously verifying user identity and permissions. After introducing Pomerium and explaining why VPNs can leave internal networks vulnerable, they walk through setting up an identity-aware proxy to ensure each incoming request is validated. They also discuss running LLMs on a local machine using tools like Olama, then securing those models so remote users—or even co-pilot extensions—can safely interact with them. The conversation touches on open source principles, details the simplicity of standing up Docker-based solutions, and examines how Pomerium handles certificates, policies, and identity providers. By highlighting both architectural concepts and hands-on demos, they offer a comprehensive snapshot of how to merge practical security tactics with AI experimentation.

Chapters

00:00 - Introduction and Catching Up

In the opening minutes, the host greets Nick Taylor and recalls past collaborations. They share how their work has shifted since their previous chat, including new projects each has tackled. Discussion quickly moves to Nick’s recent career change, setting the stage for a deeper look at his new role.

They outline how Nick became involved with Pomerium after a transition from his prior company. His background in application development frames the conversation about stepping into a security-oriented position. This contrast between past roles and fresh challenges hints at the wide range of topics that will emerge.

05:20 - From Full-Stack Engineering to DevRel

Here, Nick details his path to developer relations, emphasizing how engineering skills inform his approach to writing and speaking about technology. He comments on the interview process, noting how back-end tests and product engineering roles converged into a DevRel opportunity.

They also underscore the value of seeing docs through fresh eyes. Nick explains that diving into new systems, particularly those dealing with security, demands meticulous attention to configuration details. This segment illustrates how bridging engineering and DevRel can be pivotal for shaping better developer experiences.

10:11 - What is Zero Trust?

In this portion, the conversation pivots to a deeper examination of zero trust principles. Nick provides an analogy involving locked doors and personalized keys, contrasting it with older VPN-based security models that grant broad network access.

By illustrating constant user verification and policy checks, they highlight how zero trust prevents unauthorized escalation within a network. They liken perimeter security to a walled-off city, showing how once someone slips inside, oversight becomes patchy. This leads neatly into discussions of how Pomerium’s reverse proxy and policy-driven safeguards align with zero trust requirements.

15:12 - Common VPN Limitations

Next, they break down why VPNs can be cumbersome. Aside from security vulnerabilities, there’s an IT burden associated with supporting large user bases. The conversation cites an example of tens of thousands of employees grappling with VPN client issues.

They note that with a zero trust system, fewer complexities arise, and it’s easier to set refined permissions. Nick clarifies that continuous checks on roles or device identity simplify orchestration, enhance security, and save administrative overhead. These points reinforce why many organizations see zero trust as the future of secure access.

20:18 - Layer 7 vs. Layer 3 Security

Shifting to more technical ground, the guests explore the differences between layer 7 (application-level) and lower-level security approaches. By operating at layer 7, Pomerium can evaluate requests more intelligently and apply precise policy checks.

They compare this approach to older models that only work at the IP or network layer, illustrating why a more granular inspection yields better control. Though networking layers can be a daunting topic, their discussion breaks down how each layer addresses different components of digital communication, helping listeners grasp the benefits of application-level checks.

24:57 - Running Pomerium Locally and the Open Core Model

In this segment, Nick discusses Pomerium’s open source and open core structure, explaining how the main codebase is freely available, while enterprise features support the company’s business model. He shares the significance of launching with Docker and how the product handles identity providers.

They also note that Pomerium can be set up for individuals, small teams, or large organizations. This flexibility means advanced security doesn’t have to be limited to enterprises. By removing friction in configuration and deployment, Pomerium is shown to be accessible for varied projects.

30:22 - Introducing Olama and Securing LLMs

Attention shifts toward local LLM setups. Nick outlines how he discovered Olama, which allows users to run language models locally. He recounts initial experiments combining GitHub Copilot extensions with a locally hosted model, bridging AI features and personal infrastructure.

They connect the concept of local AI with the zero trust model, imagining scenarios where developers could secure a custom LLM for team members worldwide. This portion highlights how advanced topics like AI can blend with robust security measures, addressing real-life constraints like network policies and identity checks.

35:10 - Building a Custom GitHub Copilot Extension

Now, they walk through the steps of building and hosting a GitHub Copilot extension. Nick highlights the need to register a GitHub app, configure permissions, and handle Webhook-like requests. He points to sample code blocks illustrating how a local server ties into GitHub’s infrastructure.

They emphasize that this approach can be replicated for any technology stack, as long as the extension can route requests properly. The discussion underscores the convenience of local debugging with port forwarding and clarifies that the same technique applies to wide-scale deployments once certain details—like persistent URLs—are finalized.

40:30 - Demonstrating the Extension in Action

Here, the conversation features a live demonstration of the Copilot extension communicating with a local Olama endpoint. Nick showcases how requests travel from GitHub’s UI through his local server, then to Olama for model responses.

They discuss minor hiccups such as timeouts or unexpected tokens, but present them as natural steps in configuring a custom AI tool. This honest depiction helps developers see both the potential and typical roadblocks, reminding listeners that these processes, while straightforward, require iterative testing and debugging.

45:27 - Handling Policies and Auth with Pomerium

The dialogue refocuses on Pomerium’s policy system, showing how simple configuration changes let you grant or deny user access. By editing a YAML file or using Pomerium Zero’s graphical interface, Nick demonstrates quick updates to user email domains.

They illustrate how a 403 forbidden status is replaced with a smooth login prompt, verifying users before letting them near local endpoints. This part underscores how Pomerium sits elegantly in front of an app, gracefully implementing an identity check and controlling the flow of requests.

50:50 - Fine-Tuning Olama Access

The hosts continue exploring ways to integrate Olama with secure endpoints, diving into issues like IP binding. Nick notes how specifying “localhost” can cause confusion if requests come from an external network, prompting them to switch to explicit addresses.

They touch on the difference between a purely local environment and a scenario with real-world traffic. Listeners learn how small tweaks—like using 0.0.0.0 or a valid IP address—ensure that an LLM remains available but shielded behind Pomerium’s policies. Clear instructions demystify these typical configuration hurdles.

55:45 - Granting Access and Testing Permissions

In this section, they do real-time permission changes to show how simple it is to allow a collaborator or block an unauthorized user. By adding a domain or exact email address to a policy, Nick demonstrates near-instant updates.

They also illustrate how the system caches policies locally, thus avoiding unnecessary latency. Each request, however, still goes through the identity-aware proxy, ensuring consistent enforcement. The conversation highlights how these steps apply to a range of real-world uses, from personal projects to enterprise solutions.

1:01:10 - Integrating Copilot Extensions with Secure Endpoints

Turning back to Copilot, they attempt to combine the Pomerium authentication layer with the extension’s request flow. Nick adds error handling and instructs the extension to prompt for login if the model endpoint returns unauthorized.

They note that a more seamless experience would require building a proper OAuth or single sign-on sequence within the extension. Although they encounter challenges, the interplay of AI, DevRel, and security exemplifies the type of hands-on experimentation that fosters better tooling.

1:07:20 - Open Web UI and Future Prospects

Having covered Olama, they briefly introduce Open Web UI, another interface for local LLMs. Nick envisions hosting it on a Raspberry Pi, then providing secure access for household members or remote teammates.

They affirm that both Pomerium Zero and the open core edition can handle these setups. The conversation underscores how even small-scale experiments benefit from enterprise-grade security, bridging the gap between personal tinkering and robust policy management.

1:12:05 - Observations on Local AI and Performance

During this time, they address hardware constraints for running advanced models. Some large LLMs demand tens of gigabytes of VRAM or more, limiting hobbyists to specialized distillations. Still, they emphasize the freedom of offline experimentation.

They highlight how Pomerium’s approach ensures that, despite these limitations, locally hosted AI can still be accessed from anywhere. This synergy of convenience and confidentiality resonates with developers who want to protect both their code and their data.

They brainstorm real-life scenarios, such as gating access for on-call team members or verifying roles before letting users modify a production environment. Listeners see how quickly they can add or remove these constraints, reflecting changing business needs.

Nick describes how policies must keep pace with evolving internal structures, pointing out that Pomerium’s management layer helps unify the effort. The conversation presents security as a continuous practice—one that can align neatly with modern application development.

1:25:10 - Reviewing Setup Steps

Here, they recap the simpler path to implementing Pomerium Zero, from signing up on the website to launching via Docker. The discussion underlines that while advanced network knowledge is helpful, Pomerium’s tooling reduces complexity.

They stress how a developer can protect any internal asset, whether it’s an LLM, a small business app, or a hobby project. As long as the resource is fronted by Pomerium, users gain immediate enhancements to login enforcement, usage policies, and data security.

1:32:40 - Encouraging Experimentation and Next Steps

As they head toward closing, Nick expresses enthusiasm about continuing to refine documentation and unify developer feedback. Both speakers see the potential for more robust Copilot extensions, deeper custom authentication flows, and broader open source involvement.

They also reiterate the appeal of bridging the gap between local AI models and enterprise-grade security. The synergy of Pomerium’s zero trust approach with new LLM technology marks a promising direction, especially for creators wanting full control over their data and deployments.

1:37:00 - Wrap-Up and Final Thoughts

The final moments bring the conversation to a friendly conclusion, with each participant thanking the other and hinting at future collaborations. They reaffirm that combining open source security with local AI is both feasible and effective.

A blend of casual banter and genuine technical appreciation ends the session on an upbeat note. Listeners depart with a clear sense of how to tackle security in modern development environments, whether they’re running large language models or simply safeguarding internal services.