VNDS: A Vietnamese Dataset for Summarization

Van-Hau Nguyen,Thanh-Chinh Nguyen,Minh-Tien Nguyen,Nguyen Xuan Hoai

VNDS: A Vietnamese Dataset for Summarization

2019

Van-Hau Nguyen
Thanh-Chinh Nguyen
Minh-Tien Nguyen
Nguyen Xuan Hoai

We have seen a lot of interesting developments and research in text summarization. While numerous approaches for summarization have been widely studied and applied in various domains in English, it is still an early stage in Vietnamese due to a few number of papers, systems, and the lack of benchmark datasets. Inspired to contribute to make a progress in Vietnamese language research, firstly in this paper we create a standard dataset for document summarization. To the best our knowledge, we are the first to formally publish the large benchmark dataset of summarization. Secondly, we make a comparison of traditional and state-of-the-art extractive and abstractive summarization on our dataset. We strongly believe that the results of our work will facilitate studies of text summarization in Vietnamese for the future.

Keywords:

Natural language processing
Artificial intelligence
Vietnamese
Automatic summarization
Supervised learning
Support vector machine
Benchmark (computing)
document summarization
Computer science
Publication

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations