Multilingual Short Text Classification via Convolutional Neural Network

2018 
As multilingual text increases, the analysis of multilingual data plays a crucial role in statistical translation models, cross-language information retrieval, the construction of parallel corpus, bilingual information extraction and other fields. In this paper, we introduce convolutional neural network and propose auto-associative memory for the fusion of multilingual data to classify multilingual short text. First, the open-source tool word2vec is used to extract word vector for textual representation. Then, the auto-associative memory relationship can extract the multilingual document semantic, which need to calculate the statistical relevance of word vector between different languages. A critical problem is the domain adaptation of classifiers in different languages and we solve it by transforming multilingual text features. In order to fuse a dense combination of high-level features in multilingual text semantics, we introduce convolutional neural network into the model, and output classification prediction results. This model can process multilingual textual data well. Experiments show that convolutional neural network combined with auto-associative memory improves classification accuracy by 2 to 6% in multilingual text classification, compared to other classic models. Furthermore, the proposed model reduces the dependence of multilingual text on the parallel corpus, thus have good expansibility for multilingual data.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    13
    References
    6
    Citations
    NaN
    KQI
    []