New Jobs Simplified, AI University
← Back to courses

Inference

Model Serving Architectures

This lesson covers how to deploy and serve machine learning models, including the benefits and challenges of different model serving architectures. We will explore popular tools and techniques for deploying models, such as TensorFlow Serving and Vertex AI.

Why It Matters

Model serving architectures are crucial in the real world of AI, as they enable the efficient and scalable deployment of machine learning models. With the increasing demand for AI-powered applications, model serving architectures play a critical role in ensuring that models are deployed accurately, securely, and at scale.

Key Points

TensorFlow Serving is a popular tool for deploying machine learning models, allowing users to serve models on a variety of platforms, including mobile apps and embedded devices.
TensorFlow Serving can automatically batch requests together, reducing latency and increasing throughput.
Users can trade off latency for throughput by adjusting the batching delay, and can deploy multiple servers to handle high query volumes.
Model serving architectures are designed to handle the complexity of modern machine learning models, including the need to manage multiple requests and handle errors.
TensorFlow Serving supports a variety of machine learning frameworks, including TensorFlow and Keras.
Model serving architectures can be deployed on-premises or in the cloud, depending on the needs of the application.
Users can also deploy models using other tools, such as Vertex AI and TensorFlow Lite.
Model serving architectures are critical for ensuring the security and reliability of machine learning models in production environments.

Key Concepts

TensorFlow Serving

A tool for deploying machine learning models and serving them to users.

Vertex AI

A cloud-based platform for deploying and managing machine learning models.

Batching delay

A parameter that controls how long requests are batched together before being processed by the model.

Model serving architecture

A system for deploying and managing machine learning models in production environments.

Code Examples

A simple example of deploying a model using TensorFlow Serving.

from tensorflow_serving.api import model_service
model_service = model_service.ModelService()
model_service.predict(requests=[{'input': 'example_input'}])

An example of deploying a model using Vertex AI.

from google.cloud import aiplatform
aiplatform.Model.deploy(model_name='example_model', model_endpoint='example_endpoint')
From the books
“complete control over the model. You can use the model without depending on the API connection, fine-tune it, and run sensitive data through it. You are not dependent on any service and have complete …”
“and Diverse Learning Environments and Their Solutions”, arXiv preprint arXiv:1901.01753 (2019). 29 Rui Wang et al., “Enhanced POET: Open-Ended Reinforcement Learning Through Unbounded Invention of Lea…”
“can train a model across multiple machines, each equipped with multiple hardware accelerators. TensorFlow’s simple yet powerful distribution strategies API makes this easy, as you will see. In this ch…”