Movie genre classification using TF-IDF and SVM

2019 
This paper studies the classification principle and process of SVM algorithm, and classifies the text containing movie information, so as to achieve the research purpose of movie classification. It focuses on the various steps that need to be completed in the process of classification, such as text word segmentation, feature engineering, text representation, etc., and designs and implements a movie classification system based on SVM algorithm. In addition, the following work has been done: i) In the process of sample selection and processing, all the sample data used in the experiment come from the film profile information of Douban Film Network. In order to facilitate data acquisition and formatting, the function of automatically crawling the film profile web page is also implemented in the experiment. ii) By combining document frequency with word frequency, text feature selection can effectively avoid the disadvantage of "low word frequency first" caused by using document frequency alone. The experimental results show that the size of word frequency also has a certain influence on the classification results when feature selection is carried out. iii) In the parameter optimization stage of SVM, the principle of using kernel function and the influence of the classification model trained by observing different parameter values on the classification results are analyzed. The crosschecking method based on grid search is used to find the relatively optimal parameters. The results proves that SVM is capable for movie classification with relatively high F-Score. The performance evaluation indicates that non-linear kernel such as RBF cannot outperform linear kernel when dealing with large amount of features.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    13
    References
    1
    Citations
    NaN
    KQI
    []