OpenAI cost control: stop the bill before it lands

If your OpenAI cost story begins with an alert, the failure is not monitoring. The failure happened earlier — when the request that caused the spike was allowed to run without a pre-execution check.

OpenAI provides usage dashboards, API key-level spending limits, and monthly budget alerts. These are useful visibility tools. They are not enforcement tools. They describe what happened after tokens were consumed. A pre-execution control boundary can stop the request before a single token is consumed.

Where OpenAI's built-in controls stop

OpenAI's native cost controls are scoped to the API key and the account level. That creates real gaps for teams building production systems:

Key-level caps cannot express workload or tenant rules. A single key often serves multiple products, teams, or customers. A cap on the key does not distinguish between a high-value tenant that should always be allowed and a low-priority batch job that should be throttled.
Billing data arrives after spend. OpenAI usage data is not real-time. A retry loop, a burst, or an unexpected model swap can run for hours before the dashboard reflects the cost.
Alerts create cleanup workflows, not enforcement. A spend alert fires after tokens are consumed. Acting on it requires human intervention, code changes, or a kill switch that someone has to manually trigger.
Hard limits shut down everything. OpenAI's hard monthly limits apply to the entire key. There is no way to block expensive requests while allowing cheap ones, or to route a workload to a smaller model instead of blocking it outright.

What pre-execution control adds

Keel evaluates every request against policy and budget state before the provider call is made. The decision covers:

Is this workload within its current budget envelope?
Is the requested model on the approved list for this caller?
Does the estimated cost fit within the remaining headroom?
Should this request be routed to a cheaper model instead of blocked?

If the answer to any constraint is no, the request is denied or rerouted before it reaches OpenAI. No tokens are consumed. No spend lands. The decision is recorded with the reason, the budget state that applied, and the policy that triggered the outcome.

Specific scenarios this solves

Retry loops

A misconfigured retry policy can multiply spend by an order of magnitude in minutes. With pre-execution budget checks, each retry attempt is evaluated against the remaining envelope. Once the budget is exhausted, retries are denied — not throttled by an alert that fires too late.

Model substitution without review

A developer swaps a cheaper model for a more expensive one. Without a permit-based allowlist, the request goes through. With one, the control plane can block the substitution or require it to pass through a named exception path before execution is allowed.

Tenant or workload bursts

A single tenant or batch job consumes far more than expected. Key-level limits cannot isolate one workload without blocking all others sharing the key. Workload-scoped budget envelopes can deny the burst while leaving other callers unaffected.

Multi-model cost drift

Teams using OpenAI alongside other providers often find that fallback chains, model changes, and routing decisions accumulate into unexpected total spend. A control plane that covers all providers under the same permit model keeps cost policy consistent regardless of which model or provider handles the request.

What this is not

Keel is not a billing analytics platform and does not replace OpenAI's usage dashboards. Both serve different purposes. Dashboards answer what happened over time. Pre-execution enforcement answers whether the next request should run at all. The full picture requires both: enforcement before execution and visibility after.