Inference

AI inference is the process by which a trained AI model applies its learned knowledge to new, unseen data to generate predictions, classifications, or decisions. For example, an image recognition system identifying objects in a new image or a self-driving car recognizing a stop sign on a road it has never encountered is performing inference.

Unlike model training, which focuses on learning patterns from data, inference uses the model to produce actionable results in real-world scenarios.

There are different types of AI inference.

Batch inference processes large datasets at once, typically offline.
Real-time (online) inference produces predictions immediately for each incoming data point.
Streaming inference continuously processes live data streams from sensors or events.
Edge inference runs directly on local hardware, such as smartphones, IoT devices, or industrial sensors.

AI inference is central to applications such as large language models (LLMs), predictive analytics, autonomous systems, and recommendation engines.