Taxonomic classification method for metagenomics based on core protein families with Core-Kaiju

2020 
An increasing number of studies recognizes the importance of characterizing species diversity and composition of bacteria hosted by biota for systems that range from oceans to humans. This task is typically addressed by using environmental sequencing data (“metagenomics”). However, determining microbiomes diversity implies the classification of species composition within the sampled community, which is often done via the assignment of individual reads to taxa by comparison to a reference database. Although computational methods aimed at identifying the microbe(s) taxa are available, it is well known that the inference of the community from the same sample using different methods can vary widely depending on the various biases in each step of the analysis. In this study, we compare different bioinformatics methods based on amplicon sequencing of 16S ribosomal RNA and whole genome shotgun sequencing for taxonomic classification. We apply the methods to three mock communities of bacteria, of which the composition is known. We show that 16S data reliably allow to detect the number of species, but not the abundances, while standard methods based on shotgun data give a reliable estimate of the most abundant species, but predict a large number of false-positive species. We thus propose a novel approach, that combines shotgun data with a classification based on core protein families (PFAM), hence similar in spirit to 16S. We show that this method reliably predicts both number of species and abundance of the bacterial mock communities.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    68
    References
    0
    Citations
    NaN
    KQI
    []