Improved D-Statistic For Low-Coverage Data

2017 
The detection of ancient gene flow between human populations is an important issue in population genetics. A commonly used tool for detecting ancient admixture events is the D-statistic. The D-statistic is based on the hypothesis of a genetic relationship that involves four populations, whose correctness is assessed by evaluating specific coincidences of alleles between the groups. When working with high throughput sequencing data is it not always possible to accurately call genotypes. When genotype calling is not possible the D-statistic that is currently used samples a single base from the reads of one chosen individual per population. This method has the drawback of ignoring much of the information in the data. Those issues are especially striking in the case of ancient genomes, often characterized by low sequencing depth and high error rates for the sequenced bases. Here we provide a significant improvement to overcome the problems of the present-day D-statistic by considering all reads from multiple individuals in each population. Moreover we apply type-specific error correction to combat the problems of sequencing errors and show a way to correct for introgression from an external population that is not part of the supposed genetic relationship, and how this method leads to an estimate of the admixture rate.We prove that the improved D-statistic, as well as the traditional one, is approximated by a standard normal. Furthermore we show that our method overperforms the traditional D-statistic in detecting admixtures. The power gain is most pronounced for low/medium sequencing depth (1-10X) and performances are as good as with perfectly called genotypes at a sequencing depth of 2X. We also show the reliability of error correction on scenarios with simulated errors and ancient data, and correct for introgression in known scenarios to verify the correctness the estimation of the admixture rates.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    27
    References
    2
    Citations
    NaN
    KQI
    []