Expanding Taxonomies with Implicit Edge Semantics

2020 
Curated taxonomies enhance the performance of machine-learning systems via high-quality structured knowledge. However, manually curating a large and rapidly-evolving taxonomy is infeasible. In this work, we propose Arborist, an approach to automatically expand textual taxonomies by predicting the parents of new taxonomy nodes. Unlike previous work, Arborist handles the more challenging scenario of taxonomies with heterogeneous edge semantics that are unobserved. Arborist learns latent representations of the edge semantics along with embeddings of the taxonomy nodes to measure taxonomic relatedness between node pairs. Arborist is then trained by optimizing a large-margin ranking loss with a dynamic margin function. We propose a principled formulation of the margin function, which theoretically guarantees that Arborist minimizes an upper-bound on the shortest-path distance between the predicted parents and actual parents in the taxonomy. Via extensive evaluation on a curated taxonomy at Pinterest and several public datasets, we demonstrate that Arborist outperforms the state-of-the-art, achieving up to 59% in mean reciprocal rank and 83% in recall at 15. We also explore the ability of Arborist to infer nodes’ taxonomic-roles, without explicit supervision on this task.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    75
    References
    12
    Citations
    NaN
    KQI
    []