StackPatch is liveSee product

All posts
Security· 8 min·April 19, 2026

Your AI Stack Is Logging Everything — and Your CFO Is About to Notice

A quiet cost on every AI integration: prompts, responses, and tool calls get retained on vendor servers for 30-365 days. That's fine until it isn't. Here's why zero-retention proxies are becoming a line item — and what VaultAgent does with that bill.

There is a line item in your next AI invoice that hasn't been priced yet: retention. It doesn't show up in your OpenAI dashboard. It doesn't show up in your Anthropic bill. But it's there — every prompt your product sends to a foundation model is kept on the provider's servers for somewhere between 30 and 365 days, depending on vendor, tier, and which box you ticked in the admin console. That retention window is a liability surface. And for regulated industries it's becoming a board-level question.

This post is an operator-level look at why that matters, why it's harder to solve than it sounds, and what we shipped in VaultAgent to get it off your plate.

What retention actually means

The marketing pitch is that foundation models don't train on your data. Most of the time, that's true. But "doesn't train on" is not the same as "doesn't store." Every major provider logs requests and responses for some combination of: abuse review, safety tuning, developer debugging, incident forensics, and compliance. The retention windows in the default Terms of Service range from 30 days (Anthropic default) to 12 months (some OpenAI enterprise tiers until you opt out in writing).

So when your fintech onboarding flow asks GPT-4o to classify a know-your-customer document, that document text lives for up to a year on a server you don't control. When your healthtech intake bot summarizes a patient's symptoms, the summary is retained too. When your internal copilot reads a Slack thread to answer a question, the Slack thread becomes a line in a log file in a vendor data center.

For consumer apps, that's fine. For regulated industries, it's an audit finding waiting to happen. HIPAA, PCI-DSS, GDPR Article 28, and the new U.S. state-level AI transparency laws all treat that retention as a processing activity — which means you need a Data Processing Agreement for each vendor, each region, and each data category. Most teams discover this when a compliance lead or a customer's procurement team asks the question, and the answer is "we'll get back to you."

Why the obvious fixes don't quite fix it

Three patterns we see teams try first, and where each one breaks:

Option 1 — self-host an open model. Great for privacy. Bad for capability. Llama 3 is not Claude Opus. Mistral Large is not GPT-4o. You end up running two stacks: the open one for sensitive data, the proprietary one for everything else. And the "sensitive" boundary is exactly where your policy breaks down, because product teams need one stack, not two.

Option 2 — use the provider's zero-retention tier directly. Anthropic has Enterprise. OpenAI has Zero Data Retention. Both require separate contracting, separate API keys, separate billing, and both still log metadata (timestamps, token counts, model names) even when the content is zeroed. Compliance still has to document the metadata flow. And you still can't mix and match — if your cost-optimized router wants to send the easy prompts to GPT-4o-mini and the hard ones to Claude Sonnet, you're back to juggling two zero-retention contracts, one per provider.

Option 3 — roll your own proxy. Honorable. Expensive. Nobody we know who has built one has been happy with the maintenance cost. You end up owning a queue, a secret store, a streaming parser, a rate limiter, a usage meter, and a bunch of fiddly audit hooks. It's a six-month engineering project that competes with every other six-month engineering project on your roadmap.

What VaultAgent does, concretely

VaultAgent is the proxy you'd build if you had the time. It sits between your application and the foundation model. It strips logs, zeros the body, enforces policy, and emits an audit trail your compliance team can actually use. The design constraints were set by the teams we've watched fail at option 3:

  • Bring your own key. Your existing Anthropic, OpenAI, and Gemini contracts stay intact. VaultAgent does not re-sell tokens. You pay the model provider directly; we're not a margin in your token bill.
  • Zero retention by default. Request and response bodies are held in memory for the duration of the call and never written to persistent storage. The only thing we keep is a hash of the request, the provider, the token counts, the model, the user ID, the policy decisions, and the timestamp. That's the audit trail. It fits on a postcard.
  • Policy as code. You describe which user roles can hit which models with which data classifications in a YAML file. VaultAgent enforces it at the proxy layer. PCI data can't accidentally route to a non-approved model; HIPAA data can't leak through a cost-router shortcut. If the policy changes, you redeploy the YAML, not the application.
  • Audit that survives a subpoena. The audit log is append-only, hash-chained, and signed with a key held in your own cloud KMS. A regulator asks what happened on Tuesday morning — you have a timestamped, cryptographically verifiable answer.
  • Drop-in replacement. VaultAgent speaks the Anthropic and OpenAI APIs natively. Change the base URL in your SDK, keep every other line of application code identical. The switch is a config change, not a migration.

Who this is actually for

We have two paying tiers: Starter at $999/month and Enterprise at $2,500/month. The Starter tier is for teams who know they need this and want it yesterday — usually a small engineering team with a compliance lead breathing on their neck and one or two AI features in production. Enterprise is for teams with multi-region, multi-tenant requirements who need the audit chain signed with their own KMS and their own retention policies per tenant.

We don't sell this as an add-on to a larger platform — VaultAgent is the platform. One proxy, one audit trail, one bill.

What we're not

We are not a model provider. We do not train models. We do not have a foundation-model roadmap, and we won't. We are the narrowest possible wedge between your code and the foundation-model API: the wedge that makes it safe to ship the feature.

That's intentional. The AI infrastructure market has a lot of teams trying to be a platform; we're trying to be a commodity. Commodities get bought. Platforms get evaluated.

How to start

If you're in a regulated industry and any part of this post described a real problem you're having, the next step is to look at VaultAgent at vault.mindsparkstack.com. We provision in 15 minutes. Your existing API keys stay with you. The first week is on us — if it doesn't fit the way your team moves, we're easier to uninstall than we were to install.

If you're on the fence, the cheapest thing you can do today is open the Terms of Service for whichever foundation-model vendor you use most and search for "retention." Read the paragraph. Decide whether your legal team would sign off on what it says, given the data your product actually sends. If the answer is "probably not," the line item exists whether VaultAgent invoices for it or not.

Get daily insights like this

The Accuoa Daily — one prompt, one workflow, every day. Free.

Subscribe free