Stop Sensitive Data Before It Reaches Your LLM

Most applications weren't designed with AI in mind. They accept form input, call some downstream service, log the result. What changed is that the downstream service is now an LLM, and LLMs process everything you send them, including the parts you forgot to sanitize.

A user types their email into a chat interface. A form submission includes a phone number buried in a freeform field. A customer support ticket gets forwarded to a model with the original message still attached. None of these are edge cases. They're the normal flow of how people interact with software.

When that data reaches your model, it also reaches your provider's infrastructure, your logging pipeline, your fine-tuning data if you're collecting any. Most of that wasn't in anyone's data handling plan.

The problem with fixing it downstream

The standard response is to handle PII at the application layer: validate inputs, strip sensitive fields before logging, mask before displaying. That works fine for structured data where you know exactly what you're looking for.

It works less well when users are typing into a chat window and the whole point is to accept natural language. You can't predict every way someone will include their email address, or whether they'll type their account number while explaining a billing problem. And even if you could, that logic lives scattered across your application code, which means it's something every developer has to remember to add to every new input path they build.

You need something that scans the payload before it reaches your systems. Not a check inside your systems, but upstream of them.

The PII Protection Layer

Strake now includes a PII Protection Layer that sits directly in your request pipeline, between your application and the upstream AI provider.

When a request comes in, Strake scans the payload for personally identifiable information. Depending on how you've configured it, one of three things happens.

Detect scans and flags PII without modifying anything. Useful before you commit to a blocking or redaction policy. You get visibility into what's actually flowing through before you change any behavior.

Redact replaces identified fields with safe placeholders. john.doe@email.com becomes [REDACTED_EMAIL]. (555) 123-4567 becomes [REDACTED_PHONE]. The request still goes through. Your application still works. The model sees a sanitized version.

Block stops the request entirely if PII is present. For cases where sensitive data has no business reaching the model at all, this is the right setting.

Here's what redaction looks like on a real request:

Before / raw request

POST /api/chat
Content-Type: application/json

{
  "message": "Can you help?
  My email is john.doe@email.com
  and my phone is (555) 123-4567"
}

After / redacted

POST /api/chat
Content-Type: application/json

{
  "message": "Can you help?
  My email is [REDACTED_EMAIL]
  and my phone is [REDACTED_PHONE]"
}

The calling code doesn't change. The model just never sees the original values.

Why this matters for compliance

GDPR and HIPAA both require that personal data only gets processed where you have a legitimate basis to do so, and that you're not passing it to third parties without the right agreements in place. "We sent it to the model provider but didn't mean to" isn't a defense.

The PII Protection Layer doesn't replace a proper data handling policy. But it gives you a configurable, auditable control point at the request level rather than relying on developers catching every case in application code. That's the kind of control that actually holds up during an audit.

Where this applies

Chat interfaces are the obvious case. But the same risk exists anywhere natural language input gets forwarded to a model: customer support pipelines, forms that route submissions through AI for classification, agent workflows where the original user message passes through multiple processing steps before reaching the provider.

If users can type freeform text anywhere in your system, and that text eventually reaches a model, you want something watching that boundary.

The PII Protection Layer is available on all Strake endpoints. You configure detection patterns, redaction behavior, and blocking rules per-endpoint from your dashboard. If you're already using Strake, it's one setting away.

Stop Sensitive Data Before It Reaches Your LLM

The problem with fixing it downstream

The PII Protection Layer

Why this matters for compliance

Where this applies

Scan, redact, or block sensitive data before it reaches your model.