Multi-level Modeling of Structural Elements of Natural Language Texts and Its Applications

2018 
Methods of extracting knowledge in the analysis of large volumes of natural language texts are relevant for solving various problems in the field of analysis and generation of textual information, such as text analysis for extracting data, fact and semantics; presenting extracted information in a convenient for machine processing form (for example, ontology); classification and clustering texts, including thematic modeling; information retrieval (including thematic search, search based on the user model, ontology-based models, document sample based search); texts abstracting and annotating; developing of intelligent question-answering systems; generating texts of different types (fiction, marketing, weather forecasts etc.); as well as rewriting texts, preserving the meaning of the original text for presenting it to different target audiences. In order for such methods to work, it is necessary to construct and use models that adequately describe structural elements of the text on different levels (individual words, sentences, thematic text fragments), their characteristics and semantics, as well as relations between them, allowing to form higher-level structures. Such models should also take into account general characteristics of textual data: genre, purpose, target audience, scientific field and others. In this paper, authors review three main approaches to text modeling (structural, statistical and hybrid), their characteristics, pros and cons and applicability on different stages (knowledge extraction, storage and text generation) of solving problems in the field of analysis and generation of textual information.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    20
    References
    5
    Citations
    NaN
    KQI
    []