Data cleaning

Data cleaning refers to the process that removes any duplicate records or records missing vital information from the dataset. Incorrect information (such as a person being 25 feet tall) is also removed. The goal of data cleaning is to make certain the dataset is as complete and accurate as possible before data analysis. In the case of registries, depending on the original agreement with participants, registry owners may be permitted to reach out to participants to encourage the participant to complete missed information or confirm/correct suspected incorrect information.

Sourced From
Obtaining Data and Quality Assurance from “Registries for Evaluating Patient Outcomes: A User’s Guide” [4th Edition, 2020]
Learn More
Rare Diseases Registry Program (RaDaR): Review & Clean Your Data
National quality registries: how to improve the quality of data? [2018 published article]

« Back to Glossary Index