Knowledge Graph — Coursera Notes › Academic disciplines › Computer Science / Information Technology › Cloud Computing
Cloud inference
concept · part of Cloud Computing
Cloud inference is a deployment strategy for machine learning models that leverages scalable cloud infrastructure and automated pipelines on platforms like AWS, GCP, or Azure. It involves running model predictions on cloud servers, often using services such as AWS Lambda or Azure Functions for serverless execution, enabling on-demand scaling and cost efficiency. This approach is used when low latency is not critical, such as in batch processing or applications with variable workloads, and contrasts with edge inference, which runs models locally on devices. As a subset of cloud computing, cloud inference relies on cloud resources to handle computation, while its children—batch inference and edge inference—represent specific implementations: batch inference processes large datasets offline, and edge inference moves computation closer to data sources for real-time needs.
Inside Cloud inference (2)
- Batch inference — A deployment strategy for making predictions on large volumes of data at scheduled intervals, suitable for non-real-time use cases like financial reporting.
- Edge inference — A deployment strategy for running models on devices like phones or IoT, requiring lightweight and resource-optimized models.
Connections
- Alternative to Edge inference
- Related to AWS Lambda
- Related to Azure
- Related to GCP
- Related to Amazon Web Services
This is the text view of an interactive 3D knowledge graph — open this page with JavaScript enabled to explore it visually.