Knowledge Graph — Coursera Notes › Academic disciplines › Computer Science / Information Technology › Artificial Intelligence › Machine Learning & Data
Data preprocessing
concept · part of Machine Learning & Data
The step of transforming raw data into a format suitable for machine learning, including scaling and encoding.
Feature values are scaled to a range between 0 and 1 using MinMaxScaler to ensure optimal neural network performance. The dataset is split into 80% training and 20% testing sets.
Inside Data preprocessing (10)
- Data cleaning — The process of handling missing values, outliers, and inconsistencies in a dataset.
- Outliers — Data points that deviate significantly from other observations, identified via Z-score or IQR.
- CSV — CSV is a common file format for storing tabular data, often used in data preprocessing and analysis.
- Data normalization — TensorFlow normalizes data by dividing pixel values by 255.0 to get values in [0,1].
- fit — A method that learns preprocessing parameters from data, e.g., LabelEncoder learns the mapping from original to encoded labels.
- MinMaxScaler — A scaling method that transforms features to a fixed range, typically [0, 1].
- Missing values — Data points that are absent in a dataset, handled by removal or imputation.
- Normalization — Scaling data to a standard range, including Min-Max scaling and Z-score standardization.
- One-hot encoding — A technique to convert categorical variables into binary columns for machine learning models.
- transform — A method that applies a learned transformation to a DataFrame, returning a prepared DataFrame.
Connections
- Uses One-hot encoding
- Uses Scikit-learn
- Prerequisite of Amazon Neptune ML
- Related to Missing values
- Related to Outliers
- Related to Normalization
- Related to Data transformation
This is the text view of an interactive 3D knowledge graph — open this page with JavaScript enabled to explore it visually.