Building a large dictionary of abbreviations for named entity recognition in Portuguese historical corpora

2008 
Abbreviated forms offer a special challenge in a historical corpus, since they show graphic variations, besides being frequent and ambiguous. The purpose of this paper is to present the process of building a large dictionary of historical Portuguese abbreviations, whose entries include the abbreviation and its expansion, as well as morphosyntactic and semantic information (a predefined set of named entities – NEs). This process has been carried out in a hybrid fashion that uses linguistic resources (such as a printed dictionary and lists of abbreviations) and abbreviations extracted from the Historical Dictionary of Brazilian Portuguese (HDPB) corpus via finite-state automata and regular expressions. Besides being useful to disambiguate the abbreviations found in the HDBP corpus, this dictionary can be used in other projects and tasks, mainly NE recognition.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    15
    References
    7
    Citations
    NaN
    KQI
    []