Estimating and accounting for genotyping errors in RAD-seq experiments

2019 
In non-model organisms, evolutionary questions are frequently addressed using reduced representation sequencing techniques due to their relatively low cost, ease of use, and because they do not require genomic resources such as a reference genome. However, evidence is accumulating that many such techniques may be affected by specific biases, questioning the accuracy of obtained genotypes, and as a consequence, their usefulness in evolutionary studies. Here we introduce three strategies to assess genotyping error rates in such data: through the comparison with high quality genotypes obtained with a different technique, from independent replicates of some samples, or from a population sample when assuming Hardy-Weinberg equilibrium. Applying these strategies to data obtained with Restriction site Associated DNA sequencing (RAD-seq), arguably the most popular reduced representation sequencing technique, revealed per-allele genotyping error rates that were much higher than sequencing error rates, particularly at heterozygous sites that were wrongly inferred as homozygous. As we exemplify through the inference of genome-wide and local ancestry of well characterized hybrids of two widespread and intensively studied Eurasian poplar (Populus) species, such high error rates may easily lead to wrong biological conclusions. By properly accounting for these error rates in downstream analyses, either through the incorporation of genotyping errors directly, or by recalibrating genotype likelihoods, we were nevertheless able to use the RAD-seq data to support biologically meaningful and robust inferences of ancestry among Populus hybrids.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    62
    References
    3
    Citations
    NaN
    KQI
    []