A term weighting approach for text categorization

2005 
It is common that representative words in a document are identified and discriminated by their statistical distribution of their frequency statistics. We assume that evaluating the confidence measure of terms through content-based document analysis leads to a better performance than the parametric assumptions of the standard frequency-based method. In this paper, we propose a new approach of term weighting method that replaces the frequency-based probabilistic methods. Experiments on Naive Bayesian classifiers showed that our approach achieved an improvement compared to the frequency-based method on each point of the evaluation.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    16
    References
    2
    Citations
    NaN
    KQI
    []