A standardized framework for representation of ancestry data in genomics studies, with application to the NHGRI-EBI GWAS Catalog

2017 
Background: The accurate description of ancestry is essential to interpret and integrate human genomics data, and to ensure that advances in the field of genomics benefit individuals from all ancestral backgrounds. However, there are no established guidelines for the consistent, unambiguous and standardized description of ancestry. To fill this gap, we provide a framework, designed for the representation of ancestry in GWAS data, but with wider application to studies and resources involving human subjects. Results: Here we describe our framework and its application to the representation of ancestry data in a widely-used publically available genomics resource, the NHGRI-EBI GWAS Catalog. We present the first analyses of GWAS data using our ancestry categories, demonstrating the validity of the framework to facilitate the tracking of ancestry in big data sets. We exhibit the broader relevance and integration potential of our method by its usage to describe the well-established HapMap and 1000 Genomes reference populations. Finally, to encourage adoption, we outline recommendations for authors to implement when describing samples. Conclusions: While the known bias towards inclusion of European ancestry individuals in GWA studies persists, African and Hispanic or Latin American ancestry populations contribute a disproportionately high number of associations, suggesting that analyses including these groups may be more effective at identifying new associations. We believe the widespread adoption of our framework will increase standardization of ancestry data, thus enabling improved analysis, interpretation and integration of human genomics data and furthering our understanding of disease.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    33
    References
    3
    Citations
    NaN
    KQI
    []