Annotated corpora and tools of the PARSEME Shared Task on Semi-Supervised Identification of Verbal Multiword Expressions (edition 1.2)

Carlos Ramisch,Bruno Guillaume,Agata Savary,Jakub Waszczuk,Marie Candito,Ashwini Vaidya,Verginica Barbu Mititelu,Archna Bhatia,Uxoa Iñurrieta,Voula Giouli,Tunga Güngör,Menghan Jiang,Timm Lichte,Chaya Liebeskind,Johanna Monti,Renata Ramisch,Sara Stymme,Abigail Walsh,Hongzhi Xu,Emilia Palka-Binkiewicz,Rafael Ehren,Sara Stymne,Matthieu Constant,Caroline Pasquer,Yannick Parmentier,Jean-Yves Antoine,Carola Carlino,Valeria Caruso,Maria Pia di Buono,Antonio Pascucci,Annalisa Raffone,Anna Riccio,Federico Sangati,Giulia Speranza,Silvio Ricardo Cordeiro,Helena de Medeiros Caseli,Isaac Miranda,Alexandre Rademaker,Oto Vale,Aline Villavicencio,Gabriela Wick Pedro,Rodrigo Wilkens,Leonardo Zilio,Monica Mihaela Rizea,Mihaela Ionescu,Mihaela Onofrei,Jia Chen,Xiaomin Ge,Fangyuan Hu,Sha Hu,Minli Li,Siyuan Liu,Zhenzhen Qin,Ruilong Sun,Chenweng Wang,Huangyang Xiao,Peiyi Yan,Tsy Yih,Ke Yu,Songping Yu,Si Zeng,Yongchen Zhang,Yun Zhao,Vassiliki Foufi,Aggeliki Fotopoulou,Stella Markantonatou,Stella Papadelli,Sevasti Louizou,Itziar Aduriz,Ainara Estarrona,Itziar González,Antton Gurrutxaga,Larraitz Uria,Ruben Urizar,Jennifer Foster,Teresa Lynn,Hevi Elyovitch,Yaakov Ha-Cohen Kerner,Ruth Malka,Kanishka Jain,Vandana Puri,Shraddha Ratori,Vishakha Shukla,Shubham Srivastava,Gozde Berk,Berna Erden,Zeynep Yirmibesoglu

Annotated corpora and tools of the PARSEME Shared Task on Semi-Supervised Identification of Verbal Multiword Expressions (edition 1.2)

2020

This multilingual resource contains corpora in which verbal MWEs have been manually annotated, gathered at the occasion of the 1.2 edition of the PARSEME Shared Task on semi-supervised Identification of Verbal MWEs (2020). VMWEs include idioms (let the cat out of the bag), light-verb constructions (make a decision), verb-particle constructions (give up), inherently reflexive verbs (help oneself), and multi-verb constructions (make do). For the 1.2 shared task edition, the data covers 14 languages, for which VMWEs were annotated according to the universal guidelines. The corpora are provided in the cupt format, inspired by the CONLL-U format. Morphological and syntactic information – not necessarily using UD tagsets – including parts of speech, lemmas, morphological features and/or syntactic dependencies are also provided. Depending on the language, the information comes from treebanks (e.g., Universal Dependencies) or from automatic parsers trained on treebanks (e.g., UDPipe). This item contains training, development and test data, as well as the evaluation tools used in the PARSEME Shared Task 1.2 (2020). The annotation guidelines are available online: http://parsemefr.lif.univ-mrs.fr/parseme-st-guidelines/1.2

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations