Abstractive Summarization with Word Embedding Prediction and Scheduled Sampling

2021 
Abstractive summarization models based on the encoder-decoder framework have made great advances over the recent years. Since most summarization datasets only provide a reference summary for each article, and encoder-decoder models typically adopt the negative log likelihood loss function, the predicted synonym of the target word is equally punished as other semantically dissimilar words. To mitigate the problem, we train the summarization model to additionally predict the word embedding of the target word. A loss function calculated from the distance between the predicted embedding and target embedding is then integrated into the training loss. Besides, ground truth words are provided during training, but they are not available during inference and the model has to use predicted words instead. The discrepancy can yield errors that accumulate quickly along the generated summary. To bridge the gap, we apply the scheduled sampling strategy that partially uses the generated words during training phase. Experiments on the mainstream CNN/Daily Mail dataset demonstrate that word embedding prediction and scheduled sampling can consistently improve the pointer-generator baseline.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    30
    References
    0
    Citations
    NaN
    KQI
    []