AI Cost Control

Stop bad AI spend before the provider call with permit-based budget checks, hard limits, and block-before-bill enforcement.

Most teams only discover an AI cost problem after the invoice lands. The missing control is not another dashboard. It is a decision boundary that can deny, constrain, or reroute a request before tokens are consumed.

Keel turns spend control into an authorization decision. The permit carries the budget and routing context up front, then the closeout record reconciles the estimate with what actually ran.

Looking for the full overview? Read the AI Cost Control pillar page →

What Brings Teams Here

An invoice spike lands after the damage is done

A retry loop, model swap, or tenant burst runs all weekend because nothing in the request path can stop it before tokens are consumed.

Provider budgets are scoped to the wrong unit

Key-level limits cannot express tenant, workload, or product rules cleanly when multiple teams and customers share the same provider account.

Finance wants hard limits, engineering has dashboards

Alerts, exports, and post-hoc reports explain where money went. They do not decide whether the next expensive request should run.

Why Current Cost Controls Fail

Dashboards are downstream

Provider billing updates on provider cadence. By the time a graph moves, the expensive behavior has already executed.

Alerts create cleanup workflows

The path from bill export to pager to code-side kill switch is measured in minutes or hours, not in the milliseconds where spend could have been prevented.

Gateway budgets miss workload ownership

API-key caps can throttle traffic, but they rarely model tenant-by-workload approval cleanly once one key serves many healthy and unhealthy paths.

Cost and routing drift separately

Fallback chains, model substitutions, and silent retries change the cost envelope unless they are part of the same decision.

Alerts answer what happened. A permit answers should this run. That is the difference between reporting on spend and actually controlling it.

What Changes With Keel

Block before bill

Keel evaluates budget state, estimated cost, workload, tenant, and model before the provider call. The request is allowed, denied, or constrained before spend lands.

Hard caps with explicit exception paths

Monthly, daily, or hourly envelopes can block outright or route through a named escalation path instead of relying on tribal knowledge during an incident.

One decision layer across services

The same permit model applies whether the request starts in one service or ten, so cost control stops being a pile of bespoke middleware checks.

Reconciled actuals after execution

Estimated cost drives the pre-execution decision; actual usage closes the loop later so the budget record matches what the provider really billed.

What The Permit Carries

Workload and owner context so cost rules follow the request that is actually creating spend
Model and provider constraints so cheaper or approved routes can be enforced instead of merely suggested
Budget scope and remaining headroom so the decision reflects current policy state, not a stale export
Exception path so hard caps and approved overrides are both explicit instead of improvised

Proof Teams Can Use Later

Structured block reasons

Blocked requests can point to the policy version, matched rule, budget scope, and exception path that shaped the decision.

Budget evidence tied to the request

Teams can show what budget state applied at the moment of approval instead of reconstructing the answer from logs, invoices, and screenshots.

Shared story for finance and engineering

The same permit and closeout record explains why a request ran, what it was expected to cost, and what it actually cost after execution.

Cost is the opening wedge because it is easy to quantify. The same record becomes more valuable later when security, procurement, or finance asks what decision was made and why.