Knowledge Graph — Coursera Notes › Academic disciplines › Computer Science / Information Technology
Data management
concept · part of Computer Science / Information Technology
Data management is the process of collecting, storing, organizing, and maintaining data across its lifecycle to ensure quality, accessibility, and security. It is a core component of Information Technology, focusing on the systematic handling of data from creation to disposal. This involves practices like data auditing, validation, and archiving, as well as managing data formats (structured, semi-structured) and storage methods (e.g., replication, snapshots). Effective data management prevents issues like data silos and ensures data quality, enabling reliable analysis and decision-making. It is used in virtually all organizations that handle data, from business operations to scientific research, to maintain data integrity and compliance.
Inside Data management (16)
- Data Formats — A knowledge graph representing data formats as nodes, with edges indicating relationships such as compatibility, conversion, or hierarchy, enabling structured querying and reasoning about data format properties and interconnections.
- Direct data collection — Gathering firsthand data from your own systems or devices, such as IoT sensors, website logs, or surveys.
- Data quality — A knowledge graph representing data quality as a concept that encompasses attributes such as accuracy, completeness, consistency, timeliness, and validity, along with their definitions, metrics, and relationships to data management processes.
- Data auditing — Manual checks of data entry, sensor calibration, and recording to identify errors.
- Data governance — Policies for access, quality, and lifecycle management of data to support regulatory compliance.
- Data validation — The practice of checking input data for correctness, type, range, and nulls before processing to prevent malicious or corrupted data.
- Data versioning — Tracking dataset versions to tie each experiment to a specific, unchanging dataset.
- Semi-structured data — Data with recognizable internal organization but not stored as a fixed flat table, often containing nested structures.
- Structured data — Data stored in relational/SQL databases in tables.
- Third-party data sources — Commercial, curated data bought from providers or data brokers when self-collection is infeasible.
- Collection — Collection is the process of gathering data from various sources such as databases, sensors, APIs, and web scraping, with a focus on ensuring high quality, relevance, representativeness, and ethical sourcing.
- Data Archiving — Long-term storage of data that is no longer actively used, often for compliance.
- Data Replication — Continuous copying of data to another location to ensure availability and redundancy.
- Data silos — Isolation of data across systems, teams, regions, tools, or cloud providers, leading to repeated movement, copying, and reconciliation.
- Snapshot — A point-in-time copy of data used for quick restoration.
- Storage — Secure, organized, and scalable storage of data using databases or data lakes, with encryption for sensitive data.
This is the text view of an interactive 3D knowledge graph — open this page with JavaScript enabled to explore it visually.