Improving Biomedical Word Representation with Locally Linear Embedding

2021 
Abstract Distributed word representation, usually obtained through calculation from large corpora, has been widely used in biomedical text because of its effectiveness in representing word semantic information. High-quality and meaningful biomedical words enable doctors to obtain the gist of information and knowledge in a short time to make clinical decisions quickly. Currently, the distributed word representation ignores the influence of the word embedding geometric structure obtained through calculation on the word semantic information and cannot accurately represent the word information, thus affecting the representation effect of biomedical text. To solve the above problems, we propose a biomedical word embedding framework based on manifold learning. Our work provides new perspectives for representing biomedical word embedding, which is the key concept in biomedical natural language processing tasks. First, the distributed word representation model is used to obtain the pretrained word embedding, and then the manifold learning is used to re-embed the pretrained word embedding. To verify the validity of the proposed framework in the biomedical domain, we evaluate the algorithm by using biomedical texts. Experimental results show that the proposed method can effectively improve the results of electronic health record classification and semantic similarity.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    41
    References
    0
    Citations
    NaN
    KQI
    []