Knowledge Graph — Coursera Notes › Academic disciplines › Computer Science / Information Technology › Artificial Intelligence › Machine Learning & Data

Data preprocessing

concept · part of Machine Learning & Data

The step of transforming raw data into a format suitable for machine learning, including scaling and encoding.

Feature values are scaled to a range between 0 and 1 using MinMaxScaler to ensure optimal neural network performance. The dataset is split into 80% training and 20% testing sets.

Inside Data preprocessing (10)

Data cleaning — The process of handling missing values, outliers, and inconsistencies in a dataset.
Outliers — Data points that deviate significantly from other observations, identified via Z-score or IQR.
CSV — CSV is a common file format for storing tabular data, often used in data preprocessing and analysis.
Data normalization — TensorFlow normalizes data by dividing pixel values by 255.0 to get values in [0,1].
fit — A method that learns preprocessing parameters from data, e.g., LabelEncoder learns the mapping from original to encoded labels.
MinMaxScaler — A scaling method that transforms features to a fixed range, typically [0, 1].
Missing values — Data points that are absent in a dataset, handled by removal or imputation.
Normalization — Scaling data to a standard range, including Min-Max scaling and Z-score standardization.
One-hot encoding — A technique to convert categorical variables into binary columns for machine learning models.
transform — A method that applies a learned transformation to a DataFrame, returning a prepared DataFrame.

Connections

Uses One-hot encoding
Uses Scikit-learn
Prerequisite of Amazon Neptune ML
Related to Missing values
Related to Outliers
Related to Normalization
Related to Data transformation

This is the text view of an interactive 3D knowledge graph — open this page with JavaScript enabled to explore it visually.

Data preprocessing

Inside Data preprocessing (10)

Connections

Select a node

Quiz

Proposed changes

Data preprocessing

Inside Data preprocessing (10)

Connections

Select a node

Quiz

Proposed changes

🔒 Only the owner can edit this graph