Management and Curation of Multi-Dimensional Data in Biobank Studies

2020 
The development of secure and reliable systems to collect, store, utilise, and share data on study participants plays a critical role in large population health studies. Contemporary prospective biobank studies typically involve hundreds of thousands of participants, and collect a wide range of data through questionnaires, physical measurements, sample assays, and linkages with external data sources for an extended period. Careful planning and management of a central data repository are required to ensure the privacy, security, accessibility, flexibility, consistency, and accuracy of the data collected and generated in the study. This chapter outlines some of the key concepts and principles underlying the design and development of data storage infrastructures, database architecture, and management systems in large biobank studies. It also describes practical considerations for each step from initial data collection from study participants to delivery of research-ready datasets; from data import, cleaning, and integration; through quality checks, standardisation, and validation; and finally to preparing datasets for bone fide researchers. The general principles and approaches described should be applicable to a wide variety of population health studies in different settings.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    6
    References
    0
    Citations
    NaN
    KQI
    []