When Entity Resolution Meets Deep Learning, Is Similarity Measure Necessary?

2021 
In Entity Resolution (ER), more and more unstructured records impose challenge to the traditional similarity-based approaches, since existing similarity metrics are designed for structured records. Now that similarity is hard to measure for unstructured records, can we do pairwise matching without similarity measure? To answer this question, this research leverages deep learning’s artificial intelligence to learn the underlying record matched pattern, rather than measuring records similarity first and then making linking decision based on the similarity measure. In the representation part, token order information is taken into account in word embedding, and not considered in Bag-of-Words (Count and TF-IDF); in the model part, multilayer perceptron (MLP), convolutional neural network (CNN), and long short-term memory (LSTM) are examined. Our experiments on both synthetic data and real-world data demonstrate that, surprisingly, the simplest representation (Count) and the simplest model (MLP) together get the best results both in effectiveness and efficiency. An F-measure as high as 1.00 in the pairwise matching task shows potential for further applying deep learning in other ER tasks like blocking.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    25
    References
    0
    Citations
    NaN
    KQI
    []