Knowledge Graph — Coursera Notes › Academic disciplines › Computer Science / Information Technology › Cloud Computing

Cloud inference

Cloud inference is a deployment strategy for machine learning models that leverages scalable cloud infrastructure and automated pipelines on platforms like AWS, GCP, or Azure. It involves running model predictions on cloud servers, often using services such as AWS Lambda or Azure Functions for serverless execution, enabling on-demand scaling and cost efficiency. This approach is used when low latency is not critical, such as in batch processing or applications with variable workloads, and contrasts with edge inference, which runs models locally on devices. As a subset of cloud computing, cloud inference relies on cloud resources to handle computation, while its children—batch inference and edge inference—represent specific implementations: batch inference processes large datasets offline, and edge inference moves computation closer to data sources for real-time needs.

Inside Cloud inference (2)

Batch inference — A deployment strategy for making predictions on large volumes of data at scheduled intervals, suitable for non-real-time use cases like financial reporting.
Edge inference — A deployment strategy for running models on devices like phones or IoT, requiring lightweight and resource-optimized models.

Connections

Alternative to Edge inference
Related to AWS Lambda
Related to Azure
Related to GCP
Related to Amazon Web Services

This is the text view of an interactive 3D knowledge graph — open this page with JavaScript enabled to explore it visually.

Cloud inference

Inside Cloud inference (2)

Connections

Select a node

Quiz

Proposed changes

Cloud inference

Inside Cloud inference (2)

Connections

Select a node

Quiz

Proposed changes

🔒 Only the owner can edit this graph