A Systematic Approach to Cleaning Routine Health Surveillance Datasets: An Illustration Using National Vector Borne Disease Control Programme Data of Punjab, India

2021 
Advances in ICT4D and data science facilitate systematic, reproducible, and scalable data cleaning for strengthening routine health information systems. A logic model for data cleaning was used and it included an algorithm for screening, diagnosis, and editing datasets in a rule-based, interactive, and semi-automated manner. Apriori computational workflows and operational definitions were prepared. Model performance was illustrated using the dengue line-list of the National Vector Borne Disease Control Programme, Punjab, India from 01 January 2015 to 31 December 2019. Cleaning and imputation for an estimated date were successful for 96.1% and 98.9% records for the year 2015 and 2016 respectively, and for all cases in the year 2017, 2018, and 2019. Information for age and sex was cleaned and extracted for more than 98.4% and 99.4% records. The logic model application resulted in the development of an analysis-ready dataset that can be used to understand spatiotemporal epidemiology and facilitate data-based public health decision making.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    17
    References
    0
    Citations
    NaN
    KQI
    []