Knowledge Graph — Coursera Notes › Academic disciplines › Computer Science / Information Technology › Artificial Intelligence
Machine Learning & Data
concept · part of Artificial Intelligence
Learning patterns from data — the fuel that shapes every AI response.
Inside Machine Learning & Data (41)
- Data preprocessing — The step of transforming raw data into a format suitable for machine learning, including scaling and encoding.
- Unsupervised Learning — A machine learning paradigm that finds patterns in unlabeled data, used for clustering and association.
- Reinforcement Learning — A machine learning paradigm where an agent learns a policy through trial and error to maximize cumulative reward from an environment.
- Supervised Learning — A machine learning paradigm that trains on labeled input-output pairs for tasks like classification and regression.
- Classical ML — Traditional machine learning methods like decision trees and linear models, effective for tabular data.
- Classification — A machine learning task that predicts a categorical target variable, here NEIGHBORHOOD.
- Feature selection — Technique to improve model performance by retaining relevant features and discarding irrelevant ones.
- Model Optimization — Techniques to improve machine learning model efficiency and performance, such as pruning and quantization.
- Azure Machine Learning Service — Integrated environment for training, experimentation, and workflow management with no-code UI and SDKs, capable of deploying models as REST APIs.
- Descriptive statistics — Statistical measures like mean, median, and standard deviation used to flag anomalies via deviation from expected values.
- Model Management — Model Management encompasses the processes and tools for versioning, monitoring, and maintaining machine learning models throughout their lifecycle.
- Model Testing — Model testing is the process of evaluating a machine learning model's performance, robustness, and reliability before deployment.
- Q-learning — A value-based reinforcement learning algorithm that learns Q-values for state-action pairs using the Bellman equation, effective for discrete action spaces.
- Regression metrics — Metrics used to evaluate the performance of regression models, such as MSE and R-squared.
- Data pipeline — Automated processes that ingest raw data, clean/transform it, and load it to a destination such as a database or model.
- Monitoring and maintenance — Continuous performance monitoring and retraining of ML models to address data drift over time.
- XGBoost — A gradient-boosted tree modeling framework used for classification and regression tasks.
- Accuracy — A metric for classification models calculated as correct predictions divided by total predictions.
- Apache Spark MLlib — Spark's distributed ML library for large-scale data processing, supporting classification, regression, clustering, and streaming.
- Clickstream data — Data tracking user clicks, drop-offs, and time spent for data-driven product decisions.
- Data analysis — Summarizing and extracting insights from textual data using models like T5.
- Data freshness — Automated pipelines to keep data sources current via API syncs, real-time feeds, or refresh triggers.
- Data integration — The challenge of combining data from scattered systems, formats, and regions requiring robust pipelines and clean data.
- Fixed random state — Setting a deterministic seed for reproducibility, which is standard practice but sometimes considered a minor security concern due to predictability.
- Gradient Boosting — An ensemble technique that is highly scalable, particularly with distributed systems or cloud platforms.
- Hyperparameters — Configuration settings set before training that control the learning process and are tuned to improve performance.
- MLflow — An open-source platform for managing the ML lifecycle, including experiment tracking, model versioning, and deployment.
- Model development cycle — An iterative process of selecting an algorithm, training on data, evaluating accuracy, and repeating until performance criteria are met.
- ONNX — A framework-agnostic interchange format for exporting models trained in one framework to be deployed in another.
- Operational efficiency — Automation of routine tasks, optimization of resource allocation, predictive maintenance, and demand forecasting via ML.
- Overfitting — A modeling issue where the model fits noise in the training data, often worsened by poor data quality.
- Parameters — Values learned during training that define the model's mapping from input to output.
- Policy gradients — A policy-based reinforcement learning algorithm that directly optimizes the policy mapping states to actions, suited for continuous or high-dimensional action spaces.
- Predictive analytics — Analyzing data to predict future outcomes.
- Real-time inference — A deployment strategy for instantaneous predictions as data arrives, used in recommendations, fraud detection, and autonomous driving.
- Reproducibility — The ability to exactly reproduce a result by jointly versioning data, code, and models.
- SGD — Stochastic gradient descent optimizer used to update model weights during training.
- Synthetic data generation — Creating artificial data with controlled properties using NumPy functions like np.random.normal and np.random.randint.
- Traditional ML pipeline — A machine learning workflow that uses static, pre-collected datasets for training and evaluation.
- Train/test split — A technique to separate data into training and test sets to evaluate model performance on unseen data.
- Underfitting — A modeling issue where the model fails to capture the underlying trend, often worsened by poor data quality.
Connections
- Related to Generative AI
This is the text view of an interactive 3D knowledge graph — open this page with JavaScript enabled to explore it visually.