Knowledge Graph — Coursera Notes › Academic disciplines › Computer Science / Information Technology › Artificial Intelligence

Machine Learning & Data

concept · part of Artificial Intelligence

Learning patterns from data — the fuel that shapes every AI response.

Inside Machine Learning & Data (41)

Data preprocessing — The step of transforming raw data into a format suitable for machine learning, including scaling and encoding.
Unsupervised Learning — A machine learning paradigm that finds patterns in unlabeled data, used for clustering and association.
Reinforcement Learning — A machine learning paradigm where an agent learns a policy through trial and error to maximize cumulative reward from an environment.
Supervised Learning — A machine learning paradigm that trains on labeled input-output pairs for tasks like classification and regression.
Classical ML — Traditional machine learning methods like decision trees and linear models, effective for tabular data.
Classification — A machine learning task that predicts a categorical target variable, here NEIGHBORHOOD.
Feature selection — Technique to improve model performance by retaining relevant features and discarding irrelevant ones.
Model Optimization — Techniques to improve machine learning model efficiency and performance, such as pruning and quantization.
Azure Machine Learning Service — Integrated environment for training, experimentation, and workflow management with no-code UI and SDKs, capable of deploying models as REST APIs.
Descriptive statistics — Statistical measures like mean, median, and standard deviation used to flag anomalies via deviation from expected values.
Model Management — Model Management encompasses the processes and tools for versioning, monitoring, and maintaining machine learning models throughout their lifecycle.
Model Testing — Model testing is the process of evaluating a machine learning model's performance, robustness, and reliability before deployment.
Q-learning — A value-based reinforcement learning algorithm that learns Q-values for state-action pairs using the Bellman equation, effective for discrete action spaces.
Regression metrics — Metrics used to evaluate the performance of regression models, such as MSE and R-squared.
Data pipeline — Automated processes that ingest raw data, clean/transform it, and load it to a destination such as a database or model.
Monitoring and maintenance — Continuous performance monitoring and retraining of ML models to address data drift over time.
XGBoost — A gradient-boosted tree modeling framework used for classification and regression tasks.
Accuracy — A metric for classification models calculated as correct predictions divided by total predictions.
Apache Spark MLlib — Spark's distributed ML library for large-scale data processing, supporting classification, regression, clustering, and streaming.
Clickstream data — Data tracking user clicks, drop-offs, and time spent for data-driven product decisions.
Data analysis — Summarizing and extracting insights from textual data using models like T5.
Data freshness — Automated pipelines to keep data sources current via API syncs, real-time feeds, or refresh triggers.
Data integration — The challenge of combining data from scattered systems, formats, and regions requiring robust pipelines and clean data.
Fixed random state — Setting a deterministic seed for reproducibility, which is standard practice but sometimes considered a minor security concern due to predictability.
Gradient Boosting — An ensemble technique that is highly scalable, particularly with distributed systems or cloud platforms.
Hyperparameters — Configuration settings set before training that control the learning process and are tuned to improve performance.
MLflow — An open-source platform for managing the ML lifecycle, including experiment tracking, model versioning, and deployment.
Model development cycle — An iterative process of selecting an algorithm, training on data, evaluating accuracy, and repeating until performance criteria are met.
ONNX — A framework-agnostic interchange format for exporting models trained in one framework to be deployed in another.
Operational efficiency — Automation of routine tasks, optimization of resource allocation, predictive maintenance, and demand forecasting via ML.
Overfitting — A modeling issue where the model fits noise in the training data, often worsened by poor data quality.
Parameters — Values learned during training that define the model's mapping from input to output.
Policy gradients — A policy-based reinforcement learning algorithm that directly optimizes the policy mapping states to actions, suited for continuous or high-dimensional action spaces.
Predictive analytics — Analyzing data to predict future outcomes.
Real-time inference — A deployment strategy for instantaneous predictions as data arrives, used in recommendations, fraud detection, and autonomous driving.
Reproducibility — The ability to exactly reproduce a result by jointly versioning data, code, and models.
SGD — Stochastic gradient descent optimizer used to update model weights during training.
Synthetic data generation — Creating artificial data with controlled properties using NumPy functions like np.random.normal and np.random.randint.
Traditional ML pipeline — A machine learning workflow that uses static, pre-collected datasets for training and evaluation.
Train/test split — A technique to separate data into training and test sets to evaluate model performance on unseen data.
Underfitting — A modeling issue where the model fails to capture the underlying trend, often worsened by poor data quality.

Connections

Related to Generative AI

This is the text view of an interactive 3D knowledge graph — open this page with JavaScript enabled to explore it visually.

Machine Learning & Data

Inside Machine Learning & Data (41)

Connections

Select a node

Quiz

Proposed changes

Machine Learning & Data

Inside Machine Learning & Data (41)

Connections

Select a node

Quiz

Proposed changes

🔒 Only the owner can edit this graph