L-Regulon: A novel "soft-curation" approach supported by a semantic enriched reading for RegulonDB literature.

2020 
Manual curation is a bottleneck in the processing of the vast amounts of knowledge present in the scientific literature in order to make such knowledge available in computational resources e.g., structured databases. Furthermore, the extraction of content is by necessity limited to the pre-defined concepts, features and relationships that conform to the model inherent in any knowledgebase. These pre-defined elements contrast with the rich knowledge that natural language is capable of conveying. Here we present a novel experiment of what we call "soft curation" supported by an ad-hoc tuned robust natural language processing development that quantifies semantic similarity across all sentences of a given corpus of literature. This underlying machine supports novel ways to navigate and read within individual papers as well as across papers of a corpus. As a first proof-of-principle experiment, we applied this approach to more than 100 collections of papers, selected from RegulonDB, that support knowledge of the regulation transcription initiation in E. coli K-12, resulting in L-Regulon (L for "linguistic") version 1.0. Furthermore, we have initiated the mapping of RegulonDB curated promoters, promoters, to their evidence sentence in the given publication. We believe this is the first step in a novel approach for users and curators, in order to increase the accessibility of knowledge in ways yet to be discovered.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    28
    References
    2
    Citations
    NaN
    KQI
    []