Tibetan Text Classification Method Based on BiLSTM Model

2020 
Text classification is a key technology in the field of information retrieval and data mining. It can effectively solve the problem of information clutter and locate effective information. This paper proposes a method of merging Word2vec and TF-IDF Tibetan text representation based on class frequence variance. Based on the representation method, BiLSTM network model is used to classify Tibetan text. First of all, it proposes to perform pre-processing work such as word segmentation on the Tibetan classification text, construction of a basic stop word list, and calculation of word frequency. Then the text representation uses the method of merging Word2vec and the TF-IDF algorithm based on class frequence variance, which takes into account both the importance of words and the distribution of words. Finally, the word vector is transmitted to the classification model to train the Tibetan text classifier, and the trained classifier is used to classify the unclassified Tibetan text. The experimental results show that the text representation method combined with Word2vec and TF-IDF based on class frequency variance can effectively improve the effect of text classification. The accuracy of Tibetan text classifier based on BiLSTM can reach 89.03%, which is significantly better than RNN LSTM.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    2
    References
    0
    Citations
    NaN
    KQI
    []