Dihaploid Coffea arabica genome sequencing and assembly [W180]

2015 
Coffea arabica which accounts for 70% of world coffee production is an allotetraploid with a genome size of approximately 1.3 Gb and is derived from the hybridization of C. canephora (710 Mb) and C. eugenioides (670 Mb). To elucidate the evolutionary history of C. arabica, and generate critical information for breeding programs, a sequencing project is underway to finalize a reference genome using a dihaploid line and a set of 30 C. arabica accessions. For the reference genome, we have generated two assemblies, one from Illumina data (>150x coverage) and a second from PacBio sequences (>50x coverage). The present assemblies cover 1,031 and 1,042 Mb, respectively. After further refinement, using Illumina mate pairs and optical mapping, the genome assemblies will be annotated using RNA-Seq. Resequencing of C. eugenioides and C. canephora has been completed and is being used to better assess homeologs within the sub-genomes. Furthermore, 30 C. arabica accessions, representing wild and cultivated genotypes, are being resequenced (20x coverage) using Illumina. A C. arabica genetic map, currently including over 600 SSR markers, that differentiate between the two sub-genomes, is used to anchor the assemblies. Newly identified SNP markers are being added to the map. The final goals of the project are to produce a high quality reference genome, assess an eventual neo-diversification occurring in the cultivated varieties, have a better understanding of the species formation and evolution, and develop tools that will make the finished genome accessible and useful to breeders and researchers. (Texte integral)
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []