Experiments on a Guarani Corpus of News and Social Media

Santiago Góngora,Nicolás Giossa,Luis Chiruzzo

Experiments on a Guarani Corpus of News and Social Media

2021

Santiago Góngora
Nicolás Giossa
Luis Chiruzzo

While Guarani is widely spoken in South America, obtaining a large amount of Guarani text from the web is hard. We present the building process of a Guarani corpus composed of a parallel Guarani-Spanish set of news articles, and a monolingual set of tweets. We perform some word embeddings experiments aiming at evaluating the quality of the Guarani split of the corpus, finding encouraging results but noticing that more diversity in text domains might be needed for further improvements.

Keywords:

Artificial intelligence
diversity
building process
Natural language processing
Set (abstract data type)
quality
Word (computer architecture)
Social media
Computer science

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations