Focus location extraction from political news reports with bias correction

2017 
Automatic identification of geolocation mentioned in online news articles provide vital information for understanding associated events. While numerous open-source and commercial tools exist for geolocation extraction, they lack in reliable identification of fine-grained location, i.e., they identify location at country-level rather than a fine-grained city or locality level. The problem of location identification has been widely studied. Yet, most techniques depend on external knowledge-base or view the problem only in terms of Named Entity Recognition (NER), only to identify country-level location information. In this paper, we focus on news articles describing an event. A set of locations directly associated with the event are called focus locations. However, an event can occur only at a single location. Therefore, we aim to extract this location among focus locations, and call this as primary focus location. We propose a mechanism that utilizes the named entities to identify potential sentences containing focus locations, and then employ a supervised classification mechanism over sentence embedding to predict the primary focused geolocation. However, the main issue with such an approach is the unavailability of ground truth (i.e., whether words in a sentence is focus or non-focus) for training a classifier. In practice, labels from only a small number of news articles may be available for training due to high cost of manual labeling. If these articles are not a good representation of news articles in the wild, the classifier may not perform well. Therefore, we utilize an adaptation mechanism to overcome sampling bias in training data. Particularly, we train a classifier by using bias-corrected training data obtained from news articles published by an agency, while testing it on news articles published by a different agency. Our empirical results show superior performance compared to baseline approaches on real-world datasets consisting of news articles.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    14
    References
    12
    Citations
    NaN
    KQI
    []