Guides ยท Technology

Data Cleaning Checklist Basics

Prep datasets before analysis

This guide provides a stepwise data cleaning checklist: schema validation, missing data handling, outlier review, deduplication, and documenting transformations.

Validate schema

Confirm column types, ranges, and required fields; fail fast on violations.

Handle missing data

Quantify missingness, decide on drop, impute, or flag; record rationale.

Review outliers and duplicates

Use simple profiling to spot outliers; deduplicate using keys and fuzzy checks if needed.

Document steps

Log every transformation, assumptions, and QA checks to keep analyses reproducible.

Related Terms