Feature Impact on Sentiment Extraction of TEnglish Code-Mixed Movie Tweets

2021 
Sentiment extraction is a natural language processing task dealing with the detection and classification of sentiments in various monolingual and bilingual texts. In this context, the automation of extracting sentiments from social media text is one of the pertinent areas of research as there is an enormous noisy multilingual content. This work focuses on extracting sentiments for code-mixed Telugu–English (TEnglish) bilingual Roman script movie tweets extracted using Twitter API. Initially, every tweet in the dataset was annotated with the source language of all the words present and also the sentiment expressed in the code-mixed tweet. The annotated data was automated for sentiment extraction through machine learning-based approach. Sentiment classification was accomplished with features like character N-grams, emoticons, repetitive characters, intensifiers, and negation words using support vector machine classifier with radial basis function as it performs efficiently in high-dimensional feature vectors. The study was to focus on identifying the type of feature which has more impact in capturing sentiments. The results show that character N-grams, emoticons, and negation words are the features that affect the accuracy most.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    7
    References
    0
    Citations
    NaN
    KQI
    []