A Diachronic Italian Corpus based on “L’Unità”

Pierpaolo Basile,Annalina Caputo,Tommaso Caselli,Pierluigi Cassotti,Rossella Varvara

A Diachronic Italian Corpus based on “L’Unità”

2020

Pierpaolo Basile
Annalina Caputo
Tommaso Caselli
Pierluigi Cassotti
Rossella Varvara

In this paper, we describe the creation of a diachronic corpus for Italian by exploiting the digital archive of the newspaper “L’Unit`a”. We automatically clean and annotate the corpus with PoStags, lemmas, named entities and syntactic dependencies. Moreover, we compute frequency-based time series for tokens, lemmas and entities. We show some interesting corpus statistics taking into account the temporal dimension and describe some examples of usage of time series.

Keywords:

Computational linguistics
Lexical semantics
Series (mathematics)
Computer science
corpus based
Artificial intelligence
Natural language processing
Newspaper
dimension

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations