An open resource of structural variation for medical and population genetics

2019 
Structural variants (SVs) rearrange the linear and three-dimensional organization of the genome, which can have profound consequences in evolution, diversity, and disease. As national biobanks, human disease association studies, and clinical genetic testing are increasingly reliant on whole-genome sequencing, population references for small variants (i.e., SNVs & indels) in protein-coding genes, such as the Genome Aggregation Database (gnomAD), have become integral for the evaluation and interpretation of genomic variation. However, no comparable large-scale reference maps for SVs exist to date. Here, we constructed a reference atlas of SVs from deep whole-genome sequencing (WGS) of 14,891 individuals across diverse global populations (54% non-European) as a component of gnomAD. We discovered a rich landscape of 498,257 unique SVs, including 5,729 multi-breakpoint complex SVs across 13 mutational subclasses, and examples of localized chromosome shattering, like chromothripsis, in the general population. The mutation rates and densities of SVs were non-uniform across chromosomes and SV classes. We discovered strong correlations between constraint against predicted loss-of-function (pLoF) SNVs and rare SVs that both disrupt and duplicate protein-coding genes, suggesting that existing per-gene metrics of pLoF SNV constraint do not simply reflect haploinsufficiency, but appear to capture a gene9s general sensitivity to dosage alterations. The average genome in gnomAD-SV harbored 8,202 SVs, and approximately eight genes altered by rare SVs. When incorporating these data with pLoF SNVs, we estimate that SVs comprise at least 25% of all rare pLoF events per genome. We observed large ( ≥1Mb), rare SVs in 3.1% of genomes (~1:32 individuals), and a clinically reportable pathogenic incidental finding from SVs in 0.24% of genomes (~1:417 individuals). We also estimated the prevalence of previously reported pathogenic recurrent CNVs associated with genomic disorders, which highlighted differences in frequencies across populations and confirmed that WGS-based analyses can readily recapitulate these clinically important variants. In total, gnomAD-SV includes at least one CNV covering 57% of the genome, while the remaining 43% is significantly enriched for CNVs found in tumors and individuals with developmental disorders. However, current sample sizes remain markedly underpowered to establish estimates of SV constraint on the level of individual genes or noncoding loci. The gnomAD-SV resources have been integrated into the gnomAD browser (https://gnomad.broadinstitute.org), where users can freely explore this dataset without restrictions on reuse, which will have broad utility in population genetics, disease association, and diagnostic screening.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    77
    References
    54
    Citations
    NaN
    KQI
    []