Whole Genome Sequences of 23 Species from the Drosophila montium Species Group (Diptera: Drosophilidae): A Resource for Testing Evolutionary Hypotheses.

2020 
Large groups of species with well-defined phylogenies are excellent systems for testing evolutionary hypotheses. In this paper, we describe the creation of a comparative genomic resource consisting of 23 genomes from the species-rich Drosophila montium​ species group, 22 of which are presented here for the first time. The ​montium​ group is uniquely positioned for comparative studies. Within the ​montium clade, evolutionary distances are such that large numbers of sequences can be accurately aligned while also recovering strong signals of divergence; and the distance between the ​montium​ group and ​D. melanogaster​ is short enough so that orthologous sequence can be readily identified. All genomes were assembled from a single, small-insert library using MaSuRCA, before going through an extensive post-assembly pipeline. Estimated genome sizes within the ​montium​ group range from 155 Mb to 223 Mb (mean=196 Mb). The absence of long-distance information during the assembly process resulted in fragmented assemblies, with the scaffold NG50s varying widely based on repeat content and sample heterozygosity (min=18 kb, max=390 kb, mean=74 kb). The total scaffold length for most assemblies is also shorter than the estimated genome size, typically by 5 - 15 %. However, subsequent analysis showed that our assemblies are highly complete. Despite large differences in contiguity, all assemblies contain at least 96 % of known single-copy Dipteran genes (BUSCOs, n=2,799). Similarly, by aligning our assemblies to the ​D. melanogaster​ genome and remapping coordinates for a large set of transcriptional enhancers (n=3,457), we showed that each montium​ assembly contains orthologs for at least 91 % of ​D. melanogaster​ enhancers. Importantly, the genic and enhancer contents of our assemblies are comparable to that of far more contiguous ​Drosophila​ assemblies. The alignment of our own ​D. serrata assembly to a previously published PacBio ​D. serrata​ assembly also showed that our longest scaffolds (up to 1 Mb) are free of large-scale misassemblies. Our genome assemblies are a valuable resource that can be used to further resolve the ​montium group phylogeny; study the evolution of protein-coding genes and ​cis​-regulatory sequences; and determine the genetic basis of ecological and behavioral adaptations.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    72
    References
    7
    Citations
    NaN
    KQI
    []