Models

Inference

Running the model to generate output. Distinct from training.

Inference is what you pay for in production: each call to the model API, each token in and out. Inference cost dominates the total cost of an AI application at scale, which is why prompt caching, model routing, output-length tightening, and batch APIs are all critical optimizations.

Related terms

Tokens

Building with Inference?

We ship production AI systems built around concepts like this every quarter. Send a brief and get a written proposal in 48 hours.

Send a brief →