Recognition of Named Entities and Categories in Text using Stacked Embeddings

2020 
Named entities enable the identification of key elements in text while sentence classification provides for a summary of the same. Sequential labeling and sentence classification tasks together enable deeper extraction of information from text. Embeddings trained over a corpus pertaining to a specific domain, tend to generate strong vector representations thereby providing for the creation of better classification models. We propose custom fastText embeddings trained on a large Indian English news corpus. These embeddings are stacked with state-of-the-art Pooled Flair embeddings to generate an f1-score of 79 on a custom FIRE English NER dataset and 93.05 f1-score on a subset of the OntoNotes 5.0 dataset. The embeddings were also used for sentence classification on 20 news categories, to generate the best multi-class accuracy of 88.1%. We also propose two Indian news datasets, one based on the FIRE NER dataset and a custom multi-class sentence classification dataset.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    17
    References
    0
    Citations
    NaN
    KQI
    []