UTOPIA: an automatically UpdaTed, cOmPlete and consistent ITS reference dAtabase

2019 
Taxonomic assignment in metabarcoding analysis is a critical and challenging step. As more organisms being sequenced, taxonomy is evolving fast with multiple taxa rearrangement and thousand of new sequences uploaded each year. The internal transcribed spacer (ITS) is an ubiquitous sequence used as a barcode to identify fungi species in complex environmental samples. Currently used databases like UNITE, offer a good and reliable reference, but update frequency is generally low, and new strain sequences can take several years to be integrated. UTOPIA provides a workflow that produce an updated ITS reference database directly from the NCBI genbank and taxonomy database. Our workflow downloads all complete fungi ITS sequences from NCBI thanks to a formatted esearch query. Then homemade scripts extract sequences with their corresponding seven ranks taxonomy string. Post treatment consists on sequence quality filtering, dereplication and clustering. Taxonomy of each cluster are checked for consistency and incongruity are resolved by an homemade customizable script. Finally UTOPIA workflow generates two simple file, one fasta file containing sequences and a two columns tabulated file containing corresponding taxonomy that can be formatted for current assignment tools. On our real dataset of 11000 ITS sequences, UTOPIA performs best in term of resolution and confidence on about 60% of sequences compared to UNITE. When UNITE fails to assign sequences, UTOPIA gives annotation up to 25% of these. But more interestingly, UTOPIA taxonomy is an exact copy of NCBI’s, given the possibility to integrate latest sequenced fungal genomes.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []