IGC-Social-21.10 (The Icelandic Gigaword Corpus - Social media)

2021 
[ENGLISH] IGC-Social is a part of the IGC-project (Icelandic Gigaword corpus) that aims to collect as much as possible of Icelandic texts that can be published, under an open or restricted licence. IGC-Social contains texts from three blog sites, two forums, and Twitter. The corpus comes in two formats. One contains the texts untokenized and untagged where each paragraph is contained inside of a tag, while the other one has been tokenized, POS-tagged and lemmatized. [ICELANDIC] IGC-Social er hluti af IGC-verkefninu (Islenska risamalheildin - Icelandic Gigaword corpus) sem hefur að markmiði að safna eins miklum texta og mogulegt er sem gefa ma ut með opnu eða takmorkuðu leyfi. IGC-Social inniheldur texta af tveimur spjallþraðum, þremur bloggsiðum og . Malheildin er tviskipt. Annar hluti hennar inniheldur skjol með hreinum texta, an þess að hann hafi verið tokaður. Hinn hlutinn inniheldur textann tokaðan, markaðan og lemmaðan.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []