PathEmb: Random Walk based Document Embedding for Global Pathway Similarity Search

2019 
Pathway analysis is a cornerstone of system biology. In particular, pathway similarity search plays a key role in establishing structural, functional, and evolutionary relationships between different biological entities. Given a query pathway as well as a database, a pathway similarity search aims to identify novel pathways that are homologous to the query pathway. Unfortunately, the pathway similarity search is computationally inefficient due to the NP-complete graph isomorphism problem. In this current study, we introduce a novel algorithmic framework for pathway similarity search, named PathEmb (Pathway Embedding), which is analogous to the Skip-gram model where each pathway is represented as a "document". PathEmb exploits a second order random walk strategy to explore diverse pathway patterns. All signaling paths traversed from random walks are regarded as "sentences", which are constituted as a "document" afterwards. Then, the "document" pattern for the individual pathway is mapped into a low-dimensional feature space for downstream tasks. Furthermore, PathEmb is a topology-free pathway similarity search algorithm, which is feasible to handle any pathway with arbitrary structure. We have extensively evaluated PathEmb and other cutting-edge methods on three pathway datasets. The experimental results demonstrate that PathEmb outperforms the existing methods in terms of computational efficiency and search accuracy. The source codes of PathEmb are freely available online https://github.com/zhangjiaobxy/PathEmb.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    48
    References
    4
    Citations
    NaN
    KQI
    []