GenERes: A Genealogical Entity Resolution System

2018 
Entity resolution is the problem of identifying and linking different manifestations of the same real-world object. This is an important step for many databases to ensure a clean version of the data and to leverage the information from multiple views of the same entity. At Ancestry, we have many manifestations of the same person in our databases. For example, the same person may be found in multiple family trees, or a person may have multiple types of records which refer to him or her, such as birth, marriage, or death records. The ability to resolve entities helps us unlock powerful genealogical discoveries for our users. To resolve these entities, we have developed a robust, scalable machine learning method which works across many different types of genealogical content as well as time and place. We find substantial improvements over a previous rule-based system and demonstrate how a machine-learning-related approach can also allow for interpretability. While we focus our example within the realm of genealogical data and historical records, we provide a model architecture and some learnings which should be applicable to many entity resolution domains.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    12
    References
    3
    Citations
    NaN
    KQI
    []