A combined recall and rank framework with online negative sampling for chinese procedure terminology normalization.

2021 
MOTIVATION Medical terminology normalization aims to map the clinical mention to terminologies coming from a knowledge base, which plays an important role in analyzing Electronic Health Record (EHR) and many downstream tasks. In this paper, we focus on Chinese procedure terminology normalization. The expressions of terminology are various and one medical mention may be linked to multiple terminologies. Existing studies based on Learning To Rank (LTR) does not fully consider the quality of negative samples during model training and the importance of keywords in this domain-specific task. RESULTS We propose a combined recall and rank framework to solve these problems. A pair-wise Bert model with deep metric learning is used to recall candidates. Previous methods either train Bert in a point-wise way or based on a multi-class classification problem, which may lead serious efficiency problems or not be effective enough. During model training, we design a novel online negative sampling algorithm to activate the pair-wise method. To deal with multi-implication scenarios, we train the task of implication number prediction together with the recall task in a multi-task learning setting, since these two tasks are highly complementary. In rank step, we propose a keywords attentive mechanism to focus on domain-specific information such as procedure sites and procedure types. Finally, a fusion block merges the results of the recall and the rank model. Detailed experimental analysis shows our proposed framework has a remarkable improvement on both performance and efficiency. AVAILABILITY The source code will be available at https://github.com/sxthunder/CMTN upon publication.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    17
    References
    0
    Citations
    NaN
    KQI
    []