Strake
all systems normal sign in create your endpoint
all systems normal
v2026.04 · zero SDK changes · cli available

Stop pasting API keys into agents.

Strake sits between your app and every AI provider — vaulting real keys so they never touch your tools, blocking jailbreak attempts, redacting PII before it reaches the model, and caching repeated calls to cut costs. Paste a key once, get a disposable endpoint. Nothing to self-host.

strake.sh · hand any tool a disposable endpoint
v2026.04
// Hand a teammate a Strake endpoint instead of a real key.
// They paste it into ChatGPT → Custom GPT → Actions, or any tool
// that accepts an OpenAI-compatible base URL + bearer.
 
Server URL https://abc123.strake.sh/v1
Auth header Authorization: Bearer ct_live_7a3f…
 
// What they can do:
  → POST /chat/completions allowed
  → POST /embeddings path not allowed
  → POST /fine_tuning/jobs path not allowed
  → rate capped at 60 rpm
 
✓ Share link sent · revoke any time from the endpoint page
Works with every major
model provider.
guardrails
Stop attacks before they reach the model.
Jailbreak protection and prompt injection detection on every proxied request, before anything touches your AI provider.
data protection
Never send sensitive data to your AI provider.
Context-aware PII detection runs inline. Real values are redacted or the request is blocked — the provider never sees them.
performance
Cut LLM costs and latency instantly.
Semantic cache serves repeated prompts in ~12ms. Zero tokens billed per hit. The more your endpoint is used, the higher the hit rate climbs.
/01 the problem

Every agent that ships today leaks by default.

Secrets get pasted into .env files, baked into container images, or passed as request headers to untrusted tool calls. One exfiltration primitive in the wrong chain-of-thought and you're explaining a $10k bill to Stripe.

before the usual way
.env · committed by mistake
23 days ago
# .env
OPENAI_API_KEY=sk-proj-aB7cD9...
ANTHROPIC_API_KEY=sk-ant-api03-...
GROQ_API_KEY=gsk_bL...
 
✕ scanned by 4 bots within 90s of push
✕ unbounded spend until someone notices
✕ no way to scope access per endpoint
after vaulted by strake
.env · after strake
live
# .env · your real key stays in the vault
OPENAI_BASE_URL=https://abc123.strake.sh/v1
OPENAI_API_KEY=ct_live_7a3f... // disposable bearer
 
✓ point any OpenAI-compatible client at this URL
✓ bearer is scoped to one endpoint only
✓ rotate the upstream provider key with no redeploy
↻ revoke the bearer any time — upstream key unaffected
/02 how it works

A vault with a front door.

Paste a provider key into Strake. We store it encrypted and hand back a disposable subdomain URL + bearer. Point any OpenAI-compatible client at that URL; requests forward to the provider using the key we're holding. Revoke the endpoint any time — the real key is never touched.

01
your agent
Calls api.strake.sh with a public-safe token. No provider key on the agent runtime.
02
allow-list check
Request path checked against your endpoint's allow-list; rate limits and monthly caps enforced.
03
strake vault
vault + forward
Looks up your real provider key, attaches it to the outbound call, forwards to the provider. The key never leaves this hop.
04
provider
OpenAI, Anthropic, Gemini, xAI, Mistral and more. They see a normal, authenticated request.
05
stream back
Response streams back byte-for-byte. TTFT is identical to calling the provider directly.
/03 built for how you work

Four places your keys already leak.
One way to stop it.

01 cloud
   ┌─ cloud runtime ─┐
   │   agent.py     │
   └────────┬───────┘
            │ ct_live_7a3f…
            ▼
        strake  ──▶  sk-…

Cloud-hosted agents

Your agent runs on a machine you don't control. Your real key shouldn't. Hand the runtime a disposable bearer — revoke it the minute the container dies.

learn more
02 openclaw
   claw-instance-1  ▶  ct_live_a3f8…
   claw-instance-2  ▶  ct_live_b2e7…
   claw-instance-3  ▶  ct_live_c4d6…
                        │
                        ▼
                     sk- (vault)

OpenClaw deployments

Every OpenClaw instance gets its own Strake endpoint. Scope it, rate-limit it, revoke one without breaking the rest. The underlying key stays put.

learn more
03 mcp
   // mcp.json
   "GITHUB_TOKEN":
     "ghp_aB7cD9…"   ⨯ leaked
   "GITHUB_TOKEN":
     "ct_live_7a3f…" ✓ scoped

MCP servers

Stop pasting real tokens into plaintext MCP config. Each server gets a scoped URL that signs with your key at the edge — so one rogue tool can't drain the whole account.

learn more
04 teams
   alice  ▶  ct_live_a3…  ┐
   bob    ▶  ct_live_b2…  ├▶  sk- (shared)
   cara   ▶  ct_live_c4…  ┘
              revoke bob → alice & cara unaffected

Teams on a shared key

Every developer gets their own disposable endpoint on the same underlying key. Granular revocation. Per-seat telemetry. No more pasting sk-proj-… into Slack.

learn more
/04 what you get

Everything you need to put an agent in production.

Six primitives. No dashboards for features you don't need.

/01

Disposable endpoints, revocable any time.

Every endpoint you create is a throwaway subdomain on strake.sh. Hand it to ChatGPT, Cursor, a teammate, a vendor, a demo. Click revoke and every bearer under it stops working — the upstream key is untouched.

per-use URLone-click revoke
/02

Your real key never leaves the vault.

Paste your provider key once. We store it encrypted and return a disposable endpoint URL + bearer. The real key is never typed into a .env, baked into a container image, or pasted into an agent prompt again.

AES-256-GCM at restdecrypted only in memory
/03

Path allow-list per endpoint.

Each endpoint is scoped to exactly the provider paths it's allowed to hit. chat.completions only. messages only. Anything off-list is refused before it leaves Strake.

per-endpointwildcard + glob
/04

Per-minute rate limits.

Cap requests per minute on every endpoint. A leaked bearer can't empty your wallet overnight — and a runaway tool trips the limit, not your provider bill.

per-endpointinstant 429
/05

30-day usage charts.

Every endpoint has a live chart of request volume over the last 30 days. See at a glance which handoff is still getting traffic and which can be safely retired.

per-endpointrolling 30-day
/06

Works with every OpenAI-compatible tool.

Swap OPENAI_BASE_URL and OPENAI_API_KEY. That's the integration. ChatGPT, Claude, Cursor, scripts, Jupyter, curl — anything that speaks the OpenAI shape just works.

no SDK requireddrop-in base URL
/05 middleware

PII enforcement, jailbreak protection, and semantic cache,
per endpoint.

Add middleware to any endpoint from the dashboard. PII enforcement scrubs sensitive data before it reaches your AI provider. Jailbreak protection blocks adversarial prompt attacks before they reach your model. Semantic cache serves similar prompts from a response store — cutting latency and token spend without any code changes.

block Reject the request before it reaches the AI provider.
response · PII detected
HTTP 400
// AI never sees the message
{
  "error": {
    "code":     "middleware_blocked",
    "message":  "Request blocked: PII detected",
    "detected": [
      "first_name",
      "last_name",
      "street_address"
    ]
  }
}
// actual PII values are never returned
redact Scrub PII inline, then forward. The AI never sees the real values.
response header · fields scrubbed
HTTP 200
// "Jordan Wells, 42 Oak St" becomes:
// "[REDACTED_FIRST_NAME] [REDACTED_LAST_NAME], [REDACTED_STREET_ADDRESS]"

x-strake-redacted: first_name,last_name,street_address

// AI sees scrubbed text — real values never leave Strake
PII categories — enable or disable per group, per endpoint
Personal Identity
first_namelast_nameemailoccupation
Location
street_addresscitystatepostcode
Financial
account_numberswift_bic
Time
time

Adversarial prompts stopped before they reach your model.

Jailbreak protection runs on every proxied request. Powered by NVIDIA NeMo Guard, it detects prompt injection, DAN attempts, role-playing exploits, and other adversarial patterns — and blocks them before they ever reach your AI provider.

blocked Reject the request — the model never sees the adversarial prompt.
response · jailbreak detected
HTTP 400
// "Ignore your previous instructions and..."
{
  "error": {
    "code":     "middleware_blocked",
    "message":  "Request blocked: jailbreak attempt detected",
    "pattern":  "prompt_injection"
  }
}
// model never processes the adversarial content
monitor Log the detection and forward — useful for auditing without blocking.
response header · pattern logged
HTTP 200
// request forwarded, detection recorded

x-strake-jailbreak-detected: true
x-strake-jailbreak-pattern:  dan_attempt

// visible in your usage logs per request
Attack patterns detected — powered by NVIDIA NeMo Guard
Prompt Injection
ignore_instructionssystem_override
Role-Play Exploits
dan_attemptpersona_bypass
Instruction Bypass
constraint_removalpolicy_evasion
Adversarial Inputs
encoded_attacksindirect_injection

Every cache hit is a provider call that never happens.

Agents and assistants repeat themselves constantly — the same queries, slightly reworded. Semantic cache catches them. Enable it per endpoint, tune the similarity threshold, and watch tokens saved accumulate in your dashboard.

tokens consumed · per hit
0
A cache hit never reaches your provider. Zero tokens billed, zero cost incurred.
response time · from cache
~12ms
vs 600ms–2s for a live LLM call. Faster agents, lower latency budgets.
savings · compound over time
The more your endpoint is used, the richer the cache gets — and the higher the hit rate climbs.
miss No match found — request forwarded upstream, response stored.
first request · cache miss
→ upstream
// "What is the capital of France?"
// no match in vector store

POST /v1/chat/completions → provider
// response forwarded and stored in cache
// similarity threshold: 0.92
hit Similar prompt detected — cached response returned in milliseconds.
later request · cache hit
HTTP 200
// "Can you tell me France's capital?"
// similarity: 0.97 · stored: 4 min ago

// returned from cache in ~12ms
// provider never called
// ~400 tokens saved
Similarity threshold
default 0.92range 0.50–1.00
Cache TTL
default 24 hmax 168 h
Savings dashboard
hit rate %tokens saved
/06 security model

Your key never leaves the vault unencrypted.

Here's exactly what happens to a provider key the moment you paste it into Strake.

Envelope encryption (AES-256-GCM) at rest in Cloudflare D1
Master key held as a Cloudflare secret, isolated from app code
Plaintext decrypted only in memory on the proxy hot path
Request + response bodies never inspected or persisted
Tokens shown once; rotate to recover a lost one
PII middleware blocks or redacts sensitive data before it reaches the AI provider
Jailbreak protection blocks adversarial prompts before they reach your model
Audit trail
coming soon
/07 pricing

$3 per endpoint,
per month.

Each endpoint includes 10,000 requests. Five-endpoint floor ($15/mo). Pick 5, 10, 20, 50, 75, or 100 — upgrade or downgrade any time.

strake standard
$3/endpoint · month
10,000 requests per endpoint included. Five-endpoint minimum.
  • Unlimited labeled bearers per endpoint — rotate & revoke independently
  • Per-endpoint path allow-list
  • Per-minute rate limits
  • 30-day usage charts on every endpoint
  • 14-day grace period if a subscription lapses
endpoint plans
5$15 10$30 20$60 50$150 75$225 100$300
need more?
get in touch
More than 100 endpoints, higher per-endpoint request ceilings, or custom deployment.
  • Volume pricing past 100 endpoints
  • Higher included request limits
  • Custom path allow-list policies
/08 frequently asked

Questions we get all the time.

Q.01

What is Strake?

Strake is a vault and access layer for API keys used by AI assistants, agents, and scripts. You store your real provider keys in Strake and hand out disposable, revocable endpoints to the tools that need access.
Q.02

Who is it for?

+
Q.03

Does the assistant get my real API key?

+
Q.04

Can I revoke access?

+
Q.05

Do I need to change my workflow?

+
Q.06

Which tools does it work with?

+
Q.07

How much does it cost?

+
Q.08

What happens if I cancel?

+
Q.09

How does PII middleware work?

+
Q.10

How does jailbreak protection work?

+
Q.11

How does semantic cache work?

+
/09 writing

From the blog.

Security · Apr 27
Stop Adversarial Prompts Before They Reach Your Model
How jailbreak protection works at the proxy layer, and why that's the right place for it.
Security · Apr 25
Stop Sensitive Data Before It Reaches Your LLM
PII flows into models constantly. Here's how to catch it at the request level before it reaches your provider.
Security · Apr 19
Your API Keys Don't Belong in Environment Variables
Every few months a platform gets breached and the recommendation is "rotate your env vars." That's the wrong fix.
All posts
/10 ship it

Keys stay in the vault.
Agents stay in production.

Five minutes to wire in. The OpenAI SDK you already use works unchanged.