A phased Canis lupus familiaris Labrador Retriever reference genome utilizing high molecular weight DNA extraction methods and high resolution sequencing technologies

2020 
Reference genome fidelity is critically important for genome wide association studies (GWAS), yet many are incomplete or too dissimilar from the study population. A typical whole genome sequencing approach implies short-read technologies resulting in fragmented assemblies with regions of ambiguity low complexity. Further information is lost by economic necessity when genotyping populations, as lower resolution technologies such as genotyping arrays are commonly utilized. Here we present a phased reference genome for Canis lupus familiaris utilizing high molecular weight sequencing technologies. We tested wet lab and bioinformatic approaches to demonstrate a minimum workflow to generate the 2.4 gigabase genome for a Labrador Retriever. The resulting de novo assembly required eight Oxford Nanopore R9.4 flowcells (~23X depth) and running a 10X Genomics library on the equivalent of one lane of an Illumina NovaSeq S1 flowcell (~88X depth), bringing the cost of generating a nearly complete reference genome to less than $10K. Mapping of publicly available short-read data from ten Labrador Retrievers against this breed-specific reference resulted in an average of approximately 1% more aligned reads compared to mapping against the current gold standard reference (CanFam3.1, p
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []