Cost

Prompt caching

Provider-side caching of prompt prefixes for cheaper subsequent requests.

When you send the same prompt prefix repeatedly (system prompts, retrieved context, conversation history), providers can cache the model's internal state for that prefix and bill subsequent requests at 10-25% of the regular input rate. Aggressive prompt caching is the highest-leverage cost optimization in production AI: 60-85% input cost reduction with zero behavior change. Requires structuring prompts so stable parts come first.

Related terms

Cached input
Prompt engineering

Building with Prompt caching?

We ship production AI systems built around concepts like this every quarter. Send a brief and get a written proposal in 48 hours.

Send a brief →