Focus location extraction from political news reports with bias correction
2017
Automatic identification of geolocation mentioned in online news articles provide vital information for understanding associated events. While numerous open-source and commercial tools exist for geolocation extraction, they lack in reliable identification of fine-grained location, i.e., they identify location at country-level rather than a fine-grained city or locality level. The problem of location identification has been widely studied. Yet, most techniques depend on external knowledge-base or view the problem only in terms of Named Entity Recognition (NER), only to identify country-level location information. In this paper, we focus on news articles describing an event. A set of locations directly associated with the event are called focus locations. However, an event can occur only at a single location. Therefore, we aim to extract this location among focus locations, and call this as primary focus location. We propose a mechanism that utilizes the named entities to identify potential sentences containing focus locations, and then employ a supervised classification mechanism over sentence embedding to predict the primary focused geolocation. However, the main issue with such an approach is the unavailability of ground truth (i.e., whether words in a sentence is focus or non-focus) for training a classifier. In practice, labels from only a small number of news articles may be available for training due to high cost of manual labeling. If these articles are not a good representation of news articles in the wild, the classifier may not perform well. Therefore, we utilize an adaptation mechanism to overcome sampling bias in training data. Particularly, we train a classifier by using bias-corrected training data obtained from news articles published by an agency, while testing it on news articles published by a different agency. Our empirical results show superior performance compared to baseline approaches on real-world datasets consisting of news articles.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
14
References
12
Citations
NaN
KQI