← Back to glossary
Glossary

Model Serving

Reviewed 20 March 2026 Canonical definition

Model serving is the infrastructure that hosts trained AI models and handles inference requests at scale. Serving systems must balance latency, throughput, cost, and availability while supporting governance requirements like logging and access control.