MULCE: Multi-level Canonicalization with Embeddings of Open Knowledge Bases
2020
An open knowledge base (OKB) is a repository of facts, which are typically represented in the form of \(\langle \)subject; relation; object\(\rangle \) triples. The problem of canonicalizing OKB triples is to map different names mentioned in the triples that refer to the same entity into a basic canonical form. We propose the algorithm Multi-Level Canonicalization with Embeddings (MULCE) to perform canonicalization. MULCE executes in two steps. The first step performs word-level canonicalization to coarsely group subject names based on their GloVe vectors into semantically similar clusters. The second step performs sentence-level canonicalization to refine the clusters by employing BERT embedding to model relation and object information. Our experimental results show that MULCE outperforms state-of-the-art methods.
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
0
References
0
Citations
NaN
KQI