COMPARATIVE ANALYSIS OF DIFFERENT CLASSIFIERS FOR CASE BASED MODEL IN PUNJABI WORD SENSE DISAMBIGUATION

Himdweep Walia,Ajay Rana,Vineet Kansal

COMPARATIVE ANALYSIS OF DIFFERENT CLASSIFIERS FOR CASE BASED MODEL IN PUNJABI WORD SENSE DISAMBIGUATION

2020

ABSTRACT Research is being carried out for machines to be able to better decipher an ambiguous word. The majority of work done in Punjabi, a regional language of India and one among the 10 most spoken languages of the world, is limited to knowledge-based techniques. The implementation of Case Based Model to help decipher the Punjabi ambiguous word is new and hence the results determined can be beneficial exemplar in Punjabi Word Sense Disambiguation research. Vectorization of the sentence is done to use minimal features to help find the right context of the given ambiguous word. Four different measuring functions are used to measure the nearness of the given sample with respect to store sample, thereby using the concept of case-based reasoning. The collected sample is then subjected to four different classifiers, namely Naive Bayes, k-Nearest Neighbor, Decision Tree and Artificial Neural Network to find the closest context. The experimentation shows the variation in results subject to the size of the vector. KEYWORDS: Natural Language Processing, Word Sense Disambiguation, Punjabi language, Case Based Reasoning, Classifiers, Similarity Function. MSC: 68T50 RESUMEN Se desarrolla una investigacion para maquinas que son capaces de descifrar mejor una ambigua palabra. La mayoria el trabajo se desarrollo con el Punjabi, un lenguaje de la regional de la India y que es una de las mas utilizadas entre la 10 mas habladas en el mundo, y que es limitada para la tecnologia. La implementacion de un Modelo Basado en Caso para ayudar a descifrar palabras ambiguas del Punjabi es nuevo y por lo tanto los resultados obtenidos pueden ser un beneficioso ejemplo en el marco de la investigacion “Punjabi Word Sense Disambiguation”. La vectorizacion de las sentencias es desarrollada para usar minimales estructuras para ayudar a hallar el contexto correcto de la palabra ambigua. Cuatro funciones de medicion diferentes se usan para medir la cercania de una muestra dada respecto a la muestra archivada, por lo que se usa el concepto de razonamiento basado en caso. La muestra obtenida es entonces evaluada usando cuatro clasificadores diferentes, nombrados Naive Bayes, k-Vecinos Mas Cercanos, Arbol de Decision y Red Artificial Neuronal para hallar el mas cercano contexto. La experimentacion muestra que la variacion en los resultados estan sujetos al tamano del vector. PALABRAS CLAVE: Procesamiento del Lenguaje Natural, Desambiguacion del Sentido de la Palabra, Lenguaje Punjabi, Razonamiento Basado en Caso, Clasificadores, Funcion de Similaridad.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations