Using Elasticsearch for Linguistic Analysis of Tweets in Time and Space

Adrien Barbaresi,Antonio Ruiz Tinoco

Using Elasticsearch for Linguistic Analysis of Tweets in Time and Space

2018

Adrien Barbaresi
Antonio Ruiz Tinoco

The collection and analysis of microtexts is both straightforward from a computational viewpoint and complex in a scientific perspective, they often feature non-standard data and are accompanied by a profusion of metadata. We address corpus construction and visualization issues in order to study spontaneous speech and variation through short messages. To this end, we introduce an experimental setting based on a generic NoSQL database (Elasticsearch) and its front-end (Kibana). We focus on Spanish and German and present concrete examples of faceted searches on short messages coming from the Twitter platform. The results are discussed with a particular emphasis on the impact of querying and visualization techniques first for longitudinal studies in the course of time and second for results aggregated in a spatial perspective.

Keywords:

Metadata
Computer science
Artificial intelligence
Visualization
Data mining
Natural language processing
Spacetime
Social media
NoSQL
linguistic analysis
Creative visualization
German
Information retrieval

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations