Matching user accounts across social networks based on username and display name

2019 
Matching user accounts across social networks is helpful for building better user profile, which has practical significance for many applications. It has attracted many scholars’ attention. Existing works are mainly based on the rich online profiles or activities. However, due to privacy settings or some other specific purposes, the online rich data is usually unavailable, incomplete or unreliable. This makes the existing schemes fail to work properly. Users often make their display names and/or usernames public on different social networks. These names belonging to the same user often contain affluent information redundancies, which provide an opportunity to address the matching problem. In this paper, we focus on the problem of matching user accounts across social networks solely based on username and display name. The problem is two-fold: 1) how to characterize those information redundancies contained in the usernames or display names; 2) how to match the user accounts based on these information redundancies. To address this problem, we propose a solution to User Identification across Social Network based on Username and Display name (UISN-UD), which consists of three key components: 1) extracting features that exploit the information redundancies among names based on user naming habits; 2) training a two-stage classification framework to tackle the user identification problem based on the extracted features; 3) employing the Gale-Shapley algorithm to eliminate the one-to-many or many-to-many relationships existed in the identification results. We perform the experiments based on real social network datasets and the results show that the proposed method can provide excellent performance with F1 values reaching 90%+. From a computational point of view, comparing display names and/or usernames is surely more convenient than comparing the online rich profile attributes or activities of two accounts. This work shows the possibility of matching the user accounts with high accessible and small amount of online data.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    37
    References
    22
    Citations
    NaN
    KQI
    []