MULCE: Multi-level Canonicalization with Embeddings of Open Knowledge Bases

2020 
An open knowledge base (OKB) is a repository of facts, which are typically represented in the form of \(\langle \)subject; relation; object\(\rangle \) triples. The problem of canonicalizing OKB triples is to map different names mentioned in the triples that refer to the same entity into a basic canonical form. We propose the algorithm Multi-Level Canonicalization with Embeddings (MULCE) to perform canonicalization. MULCE executes in two steps. The first step performs word-level canonicalization to coarsely group subject names based on their GloVe vectors into semantically similar clusters. The second step performs sentence-level canonicalization to refine the clusters by employing BERT embedding to model relation and object information. Our experimental results show that MULCE outperforms state-of-the-art methods.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []