A versatile resource of 1500 diverse wild and cultivated soybean genomes for post-genomics research

2020 
With the advance of next-generation sequencing technologies, over 15 terabytes of raw soybean genome sequencing data were generated and made available in the public. To develop a consolidated, diverse, and user-friendly genomic resource to facilitate post-genomic research, we sequenced 91 highly diverse wild soybean genomes representing the entire US collection of wild soybean accessions to increase the genetic diversity of the sequenced genomes. Having integrated and analyzed the sequencing data with the public data, we identified and annotated 32 million single nucleotide polymorphisms (32mSNPs) with a resolution of 30 SNPs/kb and 12 non-synonymous SNPs/gene in 1,556 accessions (1.5K). Population structure analysis showed that the 1.5K accessions represent the genetic diversity of the 20,087 (20K) soybean accessions in the U.S. collection. Inclusion of wild soybean genomes significantly increased the genetic diversity and shorten linkage disequilibrium distance in the panel of soybean accessions. We identified a collection of paired accessions sharing the highest genomic identity between the 1.5K and 20K accessions as genomically "equivalent" accessions to maximize the use of the genome sequences. We demonstrated that the 32mSNPs in the 1.5K accessions can be effectively used for in-silico genotyping, discovering trait QTL, gene alleles/mutations, identifying germplasms containing beneficial allele and domestication selection of trait alleles. We made the 32mSNPs and 1.5K accessions with detailed annotation available at SoyBase and Ag Data Commons. The dataset could serve as a versatile resource to release the potential of the huge amount of genome sequencing data for a variety of postgenomic research.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    42
    References
    0
    Citations
    NaN
    KQI
    []