Linking Socio-economic and Demographic Characteristics to Twitter Topics

2015 
Social media data is now widely considered a viable source for market and social research. Everyday Twitter’s users generate large quantities of data through Tweet messages which express the users’ thoughts and opinions, and may also describe their activity, plans and location. In its raw form, textual data at this volume is hard to process and understand, however, it is possible to model the Tweets into a small number of topics using generative probabilistic algorithms. This paper aims to research how the content of Tweets may vary by socio-economics and demographic characteristics using Tweets from Inner London sourced from the Twitter application programming interface. Earlier research has successfully allocated over 1 million geo-located Tweets from Inner London in 2013 into a hierarchical classification of 20 groups and 100 subgroups created using a latent dirichlet allocation algorithm. The 20 groups consist of distinctive topics and uses of language, and they all demonstrate unique spatial and temporal patterns across Inner London. The next stage of the analysis explores how the Twitter classification varies across the residential geography of Inner London. Assuming that most Tweets sourced from residential buildings are likely to be sourced by residents, the classification can be compared to socio-economic and other demographic characteristics from open data sources. In addition, some characteristics such as gender and ethnicity can also be inferred from the names of Twitter users.
    • Correction
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []