Skills and agents are infrastructure: the two-audience problem

Every AI deployment inside a modern company has two audiences. The builders write the prompts, package the skills, ship the agents. The users open Claude, ChatGPT, or Copilot and run them.

Each side has a different need. Builders need their work to be versioned, reviewed, and auditable. Users need the right skill to just work when they open their AI tool. Most companies in 2026 deliver neither or just one.

Here is what that looks like.

Think about the person on your team who has THE agent. The one whose Claude setup writes release notes that sound like your company. The one whose ChatGPT produces customer replies the rest of the team quietly copies. The one who built the skill that updates your CRM during the quarterly review.

Now imagine that person hands in their notice next Tuesday.

What happens to the agent? In most companies in 2026, nothing good. The prompt lives in a personal Notion page. The agent is a custom GPT only one person can edit. The skills they built sit in a folder on their laptop. Two weeks after they leave, someone reverse-engineers a worse copy from a Teams screenshot, and the team quietly resets to the previous baseline.

That is the builder side collapsing. The user side fails more quietly. Marketing reverts to copy-paste. Sales rewrites the same prompt from memory, badly. Ops never knew the skill existed. The AI work the company invested in stops compounding.

This is the gap between AI as personal work and AI as infrastructure. We built airlock for both sides of it. The rest of this piece maps where the leading teams are, where most teams still are, and what to do about it.

Two changes you may have missed

Two things shifted in late 2025 that almost no one talks about together.

First, Anthropic published the `SKILL.md` standard in October 2025. Eight months later, every major AI client reads the same files: GitHub Copilot, Cursor, OpenAI Codex CLI, Gemini CLI, Windsurf, Antigravity, Cline, and the list keeps growing. The instructions you write for one AI tool now travel to the others.

Second, the protocol connecting AI to the tools it acts on, the Model Context Protocol, stopped being an Anthropic project. In December 2025 it was donated to the Linux Foundation's Agentic AI Foundation, co-founded with Block and OpenAI. Google, Microsoft, AWS, Cloudflare, and Bloomberg backed it.

Together, that is the first real open standard for AI tooling. And it makes the gap between teams that treat AI assets as infrastructure and teams that treat them as personal work suddenly visible.

The 2025 DORA report on AI in software engineering puts it bluntly: "AI does not automatically improve software delivery performance. Instead, it acts as a multiplier of existing engineering conditions." Teams that manage things well get faster output. Teams that do not get faster ways to ship the same problems.

Three things, often confused

A quick taxonomy. The words prompt, skill, and agent get used interchangeably and the difference matters for how you manage them.

A prompt is text the AI reads. A briefing on a Post-it. Static.

A skill is a packaged ability the AI invokes by name. Think of it as a documented how-to with examples, references, maybe a template. The AI knows it exists because it is registered somewhere. When the task fits, the AI calls the skill by name and follows what is inside.

An agent is an autonomous worker. It picks from skills, uses tools, observes the result, decides what to do next, and loops until the job is done. An agent has an identity, a permission scope, and a trail of actions.

If you remember one analogy: a prompt is a Post-it on a new hire's desk. A skill is a how-to in the team wiki anyone can pick up. An agent is an employee with a job description and access to systems.

What "good" looks like, in seven properties

Across Anthropic, Shopify, Stripe, Notion, Block, Vercel, Microsoft, and most tooling vendors, the same seven properties keep showing up. None are exotic.

Versioned. Each prompt or skill has a number. Major change, minor change, fix. Once a version ships, it stops moving. One cited example: an untracked prompt tweak that caused a 40% cost spike for a B2B analytics company in January 2026.

Reviewed. Changes go through a pull request, the way code does. Hamel Husain, who has trained over 2.000 engineers and product managers on AI evaluation (including teams inside OpenAI and Anthropic), writes: "Unsuccessful AI products almost always share a common root cause: a failure to create robust evaluation systems." Evals are not a product. They are a practice.

Owned. A named person, a named team, with a documented review schedule. Not "the AI team."

Scoped. What an agent is allowed to touch is restricted at three levels: the skill itself declares which tools it needs, the team declares which agents which people can run, the tenant declares which data each agent can reach. Microsoft Entra Agent ID is the most opinionated implementation today. Most others leave this to the operator. The OWASP Agentic Skills Top 10 formalises the same failure modes from a security angle: malicious skills, over-privileged skills, weak isolation, no governance. This is also where airlock sits: one connector your agents run through, with per-agent, per-team, per-tenant scoping enforced server-side.

Audited. Every invocation, every tool call, every approval logged in one place with the version that produced each output. OpenAI's Agents SDK turns this on by default. Vercel's AI SDK 6 shipped the same primitive in December 2025 and reports that 22% of all their AI Gateway traffic ended in a tool call in April 2026, up from 11% six months earlier.

Portable. The skill travels. If your prompt or agent is locked to one vendor's quirks, you do not own it. When that vendor's pricing shifts, when the model rotates, when they ship a competing first-party feature, you have a migration project, not infrastructure.

Reversible. A bad change rolls back in one line. Retiring an old skill follows a lifecycle, not a delete. As one engineer put it: "Make rollback boring."

Where teams store these assets today

Prompts, skills, and agents tend to live in the same place. Most companies are not yet on a dedicated platform. They are somewhere on a ladder. Here is how the rungs look against the seven properties.

Four observations.

Most teams are on rungs 1 to 4 and underestimate the gap. Local folders, vendor containers, shared drives, and Notion are where most teams sit today. Each looks managed because there is some structure, and none are connected to runtime. The illusion of governance is the most dangerous rung on the ladder.

The native-in-client rung is the fastest growing and the most dangerous in 2026. Claude, ChatGPT, Cursor, Copilot, and the others all make it easy to ship something useful inside their UI in fifteen minutes. The same easiness is the trap: every skill locks to one vendor, with no version number, no audit log, no export. The day you add a second AI client, nothing you built travels.

GitHub is the strongest free option, with a real ceiling. A repo with named owners and quality checks gets you five of the seven properties. The two it misses: per-invocation scoping and a runtime audit across vendors. It also leaves out the user side entirely. They open Claude, not Cursor. GitHub plus a thin governance layer in front is the pragmatic 2026 stack.

Vibecoded portals are the rising trap of 2026. Lovable, v0, and Bolt make it cheap to ship an internal "skills portal" in a weekend. It looks like governance. The governance properties depend on what was built underneath, which usually nobody documented. The portal works until the builder rotates off, then turns invisible.

The migration path most successful teams follow: local → GitHub → GitHub plus a governance layer in front. They skip Notion and shared drives for the skill files themselves, keeping those tools for the narrative around the skills.

The two-audience problem in detail

We named the gap at the top. Here is each side, and how to close it.

The builder side. Your engineering team, AI practice leads, platform engineers. They are comfortable with Git. They want the seven properties above: versioned, reviewed, owned, scoped, audited, portable, reversible. GitHub plus discipline gets them most of the way. The two things GitHub misses: per-invocation scoping and a runtime audit that travels across vendors.

The user side. Most of the company. Marketing pulling release notes. Sales generating follow-up emails. Ops reconciling CRM mismatches. Legal summarising contracts. They open Claude. They open ChatGPT. They do not open Cursor. They do not pull from a Git repo. They do not run `git fetch` to get the latest version of a skill.

When you put your skills in GitHub, you solve the builders' problem. You do not solve the users' problem. A skill library that asks the user to think about Git fails silently. They go back to copy-pasting from someone's Notion, and your governance work delivers nothing to the people actually using the AI.

The fix for the user side is a delivery layer that hides Git from the user. The marketing manager opens Claude, the latest signed version of the skill is already there, scoped to what they are allowed to use, logged on the way out. They never see a repo. They never pull anything. This is the layer airlock sits on.

Six patterns to throw in the bin

The same six patterns keep showing up in engineering retrospectives from the past year. Snyk's February 2026 audit of nearly 4.000 published skills found 36% had security flaws and 13% had critical ones, so the patterns are not theoretical.

Prompts in personal Notion pages or Teams threads. The most common starting point. The best prompt in your company today lives somewhere no one else can find or version. When the author leaves, it leaves.

Prompts pasted inline in source code. Kills change history, couples prompt iteration to deploys, forces every fix through engineering.

Letting AI roam free in a large codebase without a harness. Shopify's Obie Fernandez, in the Roast post: "AI agents need help staying on track, and work much better when you break down complicated prompts into discrete steps. Non-determinism is the enemy of reliability."

Few-shotting tool selection instead of letting the AI self-select. Notion learned this across four full rebuilds of their agent harness in three years. Head of AI Sarah Sachs: "We reorchestrated it to be self-selecting on the tools, rather than few-shotting."

Proprietary JSON schemas over formats the model already understands. Notion replaced their custom API surface with plain SQL syntax. Performance and reliability jumped. The model already knew SQL.

Uniform governance across all agents. Gartner's May 2026 warning predicts 40% of enterprises will decommission autonomous agents by 2027 because of governance gaps. A read-only research agent and a write-enabled CRM-updating agent do not need the same controls. The strictest policy on both produces friction without safety. The loosest produces incidents.

What real teams have actually shipped

Three or four examples beat thirty.

Anthropic. Hundreds of skills in production across nine internal categories. Within months of launching their internal Sales plugin, 80% of Anthropic's sales organisation was using it. Shared skills came pre-wired to Salesforce, Gong, BigQuery, and more.

Stripe. Ships Minions, Slack-emoji-triggered coding agents that merge 1.300+ pull requests per week. Every PR is human-reviewed. Stripe also packages its payment domain knowledge as portable skills that anyone using Claude or Codex can pull in.

Shopify. Ships an AI Toolkit that includes plugins, skills, and a local MCP server. Their internal hosting platform pre-installs skills for over 50.000 sites. Every AI request from every internal tool flows through an internal proxy called Genie, which masks personally identifiable information and blocks prompt-injection attempts at the platform layer.

The pattern across all three: AI assets are managed infrastructure, not personal config. A central foundation owns the cross-cutting skills. A pipeline versions them. A gateway scopes them. An audit log tracks them. The marketing manager never sees any of it. None of them got there in one quarter. All of them started with an inventory.

What to do in the next three months

If your team is not yet treating AI prompts, skills, and agents as infrastructure, here is a realistic three-month start. None of it requires buying anything.

Month 1. Inventory and one use case. List every skill, prompt file, custom GPT, MCP server, and agent your team has in flight. Most teams cannot answer this in under an hour. The inventory is the foundation everything else sits on. Then pick one team and one use case. Move their best three prompts into a shared repository with version tags and a one-page README per prompt. Ship one real piece of work with them. Refine based on what breaks. Do not roll out org-wide yet.

Month 2. Quality gates and federation. Add automated quality checks on those three prompts. Run them in CI on every change. Promptfoo is the open-source default, used by both OpenAI and Anthropic. Then federate. Pick the two or three prompts that are clearly cross-team (release notes, code review, support response). Move them to a shared foundation. Leave the rest team-local. Document the ownership rule in one paragraph.

Month 3. Audit and scale. Log every invocation of every shared skill, in one place, with the version that produced each output. The point is not a dashboard. The point is that "what version did this come from?" is answerable in under a minute. By the end of month three, you will have either confirmed that GitHub-and-discipline is enough for your team, or you will have hit the ceiling.

Past the ceiling, on a multi-team, multi-vendor, multi-skill stack, the two missing properties (per-invocation scoping and cross-vendor audit) become things you cannot fake with file permissions. That is the line where a dedicated governance layer starts to pay back. It is the line we built airlock for.

Closing

By mid-2026 the platforms have shipped the open standards faster than most teams have adopted them. SKILL.md, MCP, version numbers on prompts, automated quality gates in CI, per-agent identity. The teams winning at this run the same AI as everyone else, on top of infrastructure most companies still treat as personal work.

Airlock solves both sides of the two-audience problem. The builders get versioned, audited, portable assets. The users get the right skill in the right tool, without ever seeing a repo. One proxy your agents connect through, one audit log behind them, governed once instead of per-vendor.

You can start without us. You should. You can also start with us, free, on our solo tier.

Start free →Team pricing →