Use of a draft genome of coffee (Coffea arabica) to identify SNPs associated with caffeine content

2018 
Arabica coffee (Coffea arabica) has a small gene pool limiting genetic improvement. Selection for caffeine content within this gene pool would be assisted by identification of the genes controlling this important trait. Sequencing of DNA bulks from 18 genotypes with extreme high or low caffeine content from a population of 232 genotypes was used to identify linked polymorphisms. To obtain a reference genome a whole genome assembly of arabica coffee (variety K7) was achieved by sequencing using short read (Illumina) and long read (PacBio) technology. Assembly was performed using a range of assembly tools resulting in 76,409 scaffolds with a scaffold N50 of 54,544 bp and a total scaffold length of 1,448 Mb. Validation of the genome assembly using different tools showed high completeness of the genome. More than 99% of transcriptome sequences mapped to the C. arabica draft genome and 89% of BUSCOs were present. The assembled genome annotated using AUGUSTUS yielded 99,829 gene models. Using the draft arabica genome as reference in mapping and variant calling allowed the detection of 1,444 non-synonymous SNPs associating with caffeine content. Based on KEGG pathway-based analysis, 65 caffeine-associated SNPs were discovered, among which 11 SNPs were associated with genes encoding enzymes involved in the conversion of substrates which participate in the caffeine biosynthesis pathways. This analysis demonstrated the complex genetic control of this key trait in coffee. This article is protected by copyright. All rights reserved.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    79
    References
    22
    Citations
    NaN
    KQI
    []