Knowledge Graph — Coursera Notes › Academic disciplines › Computer Science / Information Technology › Artificial Intelligence › Machine Learning & Data › Data preprocessing
Data cleaning
concept · part of Data preprocessing
The process of handling missing values, outliers, and inconsistencies in a dataset.
Inside Data cleaning (5)
- pandas — A Python library for data manipulation and storage, used to save scraped data to CSV.
- df.isnull().sum() — A pandas method to count missing values per column, used to verify handling of missing data.
- Missingno — A Python library for visualizing missing data patterns using matrix and heatmap plots.
- remove_outliers — A function that removes outliers using Z-score, now restricted to numeric columns via df.select_dtypes(include=[np.number]).
- Z-score — A statistical measure indicating how many standard deviations a data point is from the mean, used for outlier detection.
Connections
- Uses Missingno
- Uses Z-score
- Uses pandas
- Related to Overfitting
- Related to Underfitting
- Related to Missingno
- Related to Z-score
- Related to pandas
- Related to Underfitting
This is the text view of an interactive 3D knowledge graph — open this page with JavaScript enabled to explore it visually.