Dihaploid Coffea arabica genome sequencing and assembly [W180]

Alexandre de Kochko,Dominique Crouzillat,Michel Rigoreau,Maud Lepelley,L. Bellanger,Virginie Mérot-L’Anthoëne,Céline Vandecasteele,Romain Guyot,Valérie Poncet,Christine Tranchant-Dubreuil,Perla Hamon,Serge Hamon,Emmanuel Couturon,Patrick Descombes,Deborah Moine,Lukas A. Mueller,Susan R. Strickler,Alan A. Andrade,Luiz Filipe Protasio Pereira,Pierre Marraccini,Giovanni Giuliano,Alessia Fiore,Marco Pietrella,G. Aprea,Ray Ming,Jennifer Wai,Douglas Silva Domingues,Alexandre Rossi Paschoal,Gerrit Kühn,Jonas Korlach,Jason Chin,David Sankoff,Chunfang Zheng,Victor A. Albert

Dihaploid Coffea arabica genome sequencing and assembly [W180]

2015

Coffea arabica which accounts for 70% of world coffee production is an allotetraploid with a genome size of approximately 1.3 Gb and is derived from the hybridization of C. canephora (710 Mb) and C. eugenioides (670 Mb). To elucidate the evolutionary history of C. arabica, and generate critical information for breeding programs, a sequencing project is underway to finalize a reference genome using a dihaploid line and a set of 30 C. arabica accessions. For the reference genome, we have generated two assemblies, one from Illumina data (>150x coverage) and a second from PacBio sequences (>50x coverage). The present assemblies cover 1,031 and 1,042 Mb, respectively. After further refinement, using Illumina mate pairs and optical mapping, the genome assemblies will be annotated using RNA-Seq. Resequencing of C. eugenioides and C. canephora has been completed and is being used to better assess homeologs within the sub-genomes. Furthermore, 30 C. arabica accessions, representing wild and cultivated genotypes, are being resequenced (20x coverage) using Illumina. A C. arabica genetic map, currently including over 600 SSR markers, that differentiate between the two sub-genomes, is used to anchor the assemblies. Newly identified SNP markers are being added to the map. The final goals of the project are to produce a high quality reference genome, assess an eventual neo-diversification occurring in the cultivated varieties, have a better understanding of the species formation and evolution, and develop tools that will make the finished genome accessible and useful to breeders and researchers. (Texte integral)

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations