C-CLUE: A Benchmark of Classical Chinese Based on a Crowdsourcing System for Knowledge Graph Construction

2021 
Knowledge Graph Construction (KGC) aims to organize and visualize knowledge, which is based on tasks of Named Entity Recognition (NER) and Relation Extraction (RE). However, the difficulty of comprehension, caused by the differences in grammars and semantics between classical and modern Chinese, makes entity and relation annotations time-consuming and labour-intensive in classical Chinese corpus. In this paper, we design a novel crowdsourcing annotation system, which can gather collective intelligence as well as utilize domain knowledge to achieve efficient annotation and obtain fine-grained datasets with high quality. More specifically, we judge the user professionalism, calculated by online tests, considered in annotation results integration and rewards assignment, which plays a vital role in improving the accuracy of annotation. Moreover, we evaluate several pre-training language models, the state-of-the-art methods in Natural Language Processing (NLP), on the benchmark datasets obtained by the system over tasks of NER and RE. Benchmark datasets, implementation details, and evaluation processes are available at https://github.com/jizijing/C-CLUE. The access URL of the crowdsourcing annotation system is: http://152.136.45.252:60002/pages/login.html.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    6
    References
    0
    Citations
    NaN
    KQI
    []