Characterizing User-Generated Text Content Mining: A Systematic Mapping Study of the Portuguese Language

Ellen Souza,Dayvid Castro,Douglas Vitório,Ingryd Teles,Adriano L. I. Oliveira,Cristine Gusmão

Characterizing User-Generated Text Content Mining: A Systematic Mapping Study of the Portuguese Language

2016

Unstructured data accounts for more than 80 % of enterprise data and is growing at an annual exponential rate of 60 %. Text mining refers to the process of discovering new, previously unknown and potentially useful information from a variety of unstructured data including user-generated text content (UGTC). Given that Portuguese language is one of the most common languages in the world, and it is also the second most frequent language on Twitter, the goal of this work is to plot the landscape of current studies that relates the application of text mining to UGTC in the Portuguese language. The systematic mapping review method was applied to search, select, and to extract data from the included studies. Our manual and automated searches retrieved 6075 studies up to year 2014, from which 35 were included in the study. Text classification concentrates 79 % of all text mining tasks, having the Naive Bayes as the main classifier and Twitter as the main data source.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations