Cost

Cached input

A token sent to the model that the provider has already processed, billed at a discount.

When you send the same prompt prefix repeatedly (system prompt, retrieved-doc context, conversation history), the provider can cache its KV-state internally and bill those tokens at a fraction of the regular rate — typically 10-25%. Aggressive prompt caching is the highest-leverage cost optimization in production AI: 60-85% input-cost reduction with zero behavior change. Requires structuring prompts so the stable parts come first.

Related terms

Prompt caching
Tokens
Prompt engineering

Building with Cached input?

We ship production AI systems built around concepts like this every quarter. Send a brief and get a written proposal in 48 hours.

Send a brief →