Accurate analysis of short read sequencing in complex genomes: A case study using QTL-seq to target blanchability in peanut (Arachis hypogaea)

2021 
Next Generation sequencing was a step change for molecular genetics and genomics. Illumina sequencing in particular still provides substantial value to animal and plant genomics. A simple yet powerful technique, referred to as QTL sequencing (QTL-seq) is susceptible to high levels of noise due to ambiguity of alignment of short reads in complex regions of the genome. This noise is particularly high when working with polyploid and/or outcrossing crop species, which impairs the efficacy of QTL-seq in identifying functional variation. By filtering loci based on the optimal alignment of short reads, we have developed a pipeline, named Khufu, that substantially improves the accuracy of QTL-seq analysis in complex genomes, allowing de novo variant discovery directly from bulk sequence. We first demonstrate the pipeline by identifying and validating loci contributing to blanching percentage in peanut using lines from multiple related populations. Using other published datasets in peanut, Brassica rapa, Hordeum volgare, Lactua satvia, and Felis catus, we demonstrate that Khufu produces more accurate results straight from bulk sequence. Khufu works across species, genome ploidy level, and data types. In cases where identified QTL were fine mapped, the fine mapped region corresponds to the top of the peak identified by Khufu. The accuracy of Khufu allows the analysis of population sequencing at very low coverage (<3x), greatly decreasing the amount of sequence needed to genotype even the most complex genomes.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    40
    References
    0
    Citations
    NaN
    KQI
    []