Processing Multilingual Word Embeddings of Indian Languages with Semi-supervised Learning

2021 
Most of the methods used nowadays for learning bilingual word embeddings are dependent on vast parallel corpora or dictionaries, which is rather tough to cover for many languages, and when it comes to Indian languages, the language pairs not unavailable most of the times. With modern-day natural language processing algorithms, we can surpass this hurdle, with a notion of convolutional neural nets. To surpass this requirement, a simple self-learning approach has been adopted in this paper, which diminishes the requirement of document aligned corpora for languages under test. The approach of this paper is to reduce the use of bilingual resources for mapping and device a own learning technique that can relate with a dictionary relied mapping algorithm. The system can work with as less as 30-word pairs and still perform at par with the algorithms using richer sources of bilingual data.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    2
    References
    0
    Citations
    NaN
    KQI
    []