← All courses
Inference
How trained models make predictions — model serving, quantization, latency optimization, and batching strategies.
0 / 3 lessons
0%
01
How Inference Works
Inference is a way to make predictions or decisions based on data and models. It involves using algorithms to calculate ...
02
Model Serving Architectures
This lesson covers how to deploy and serve machine learning models, including the benefits and challenges of different m...
04
Batching, Caching & Latency
: This lesson covers batching, caching, and latency in AI systems, discussing how they can improve performance and effic...