Evaluation of Embeddings in Medication Domain for Spanish Language Using Joint Natural Language Understanding

2021 
Word embeddings have been widely used in Natural Language Processing as the input to neural networks. Such word embeddings can help in the understanding of the final objective and the keywords in a sentence. As such, in this work, we study the impact of different word embeddings trained with general and specific corpora using Joint Natural Language Understanding in a Spanish medication domain. We generate data using templates for training the model. The model is used for intent detection and slot-filling. We compare word2vec and fastText as word embeddings and ELMo and BERT as language models. We use three different corpora to train the embeddings: the training data generated for this scenario, the Spanish Wikipedia as general domain and the Spanish drug database as specialized data. The best result was obtained with word2vec continuous bag of words model learned with Spanish Wikipedia, obtaining a 71.77% F1-score for intent detection, an intent accuracy of 69.37% and a 74.36% F1-score for slot-filling.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    13
    References
    0
    Citations
    NaN
    KQI
    []