Generation of a pseudothesaurus for information retrieval based on cooccurrences and fuzzy set operations

1983 
A thesaurus in bibliographic information retrieval is a list of technical terms with relations among them, enabling generic retrieval of documents having different but related keywords. Since the construction of a thesaurus is resource consuming an automatic generation method of a thesaurus-like structure is needed. A set-theoretical model of an abstract thesaurus is developed which is related to an automatic generation method based on cooccurrences of terms in the set of texts. Replacement of a basis set in the model and transformation of cooccurrence frequencies into fuzzy sets enables the transition from the abstract mathematical model to an actual procedure of automatic generation. The generated structure is called a pseudothesaurus. An algorithm to generate the pseudothesaurus from a large amount of data is developed. Moreover, two examples based on a dictionary of scientific usage and on an actual bibliographic database are given.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    42
    Citations
    NaN
    KQI
    []