Advancing NLP via a distributed-messaging approach

2016 
Natural Language Processing (NLP) constitutes a fundamental module for a plethora of domains where unstructured text is a predominant source. Despite the keen interest of both industry and research community in developing NLP tools, current industrial solutions still suffer from two main cons. First, the architectures underlying existing systems do not satisfy critical requirements of large-scale processing, completeness, and versatility. Second, the algorithms typically employed for entity recognition and disambiguation — a core task common to all modern NLP systems — are still not well-suited for deployment in a real industrial environment, for evident issues of efficiency and result interpretability. In this paper we present Hermes, a novel NLP tool that overcomes the two main limitations of existing solutions. By employing an efficient and extendable distributed-messaging architecture, Hermes achieves the critical requirements of large-scale processing, completeness, and versatility. Moreover, our tool includes an entity-disambiguation algorithm enhanced with a two-level hashing-based approximation technique to considerably improve efficiency, as a well as a densest-subgraph-extraction method to increase result interpretability.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    33
    References
    4
    Citations
    NaN
    KQI
    []