← Back to glossary
Glossary

Model Serving

Reviewed 9 April 2026 Canonical definition

Model serving is the infrastructure that hosts trained AI models and handles inference requests at scale. Serving systems must balance latency, throughput, cost, and availability while supporting governance requirements like logging and access control.

See how every agent performs — and make it better

Prefactor helps teams observe, evaluate, and improve their AI agents in production — across every framework and provider.