Deep Learning of CTCF-Mediated Chromatin Loops in 3D Genome Organization

2019 
The three-dimensional organization of the human genome is of crucial importance for gene regulation. Results from high-throughput chromosome conformation capture techniques show that the CCCTC-binding factor (CTCF) plays an important role in chromatin interactions, and CTCF-mediated chromatin loops mostly occur between convergent CTCF-binding sites. However, it is still unclear whether and what sequence patterns in addition to the convergent CTCF motifs contribute to the formation of chromatin loops. To discover the complex sequence patterns for chromatin loop formation, we have developed a deep learning model, called DeepCTCFLoop, to predict whether a chromatin loop can be formed between a pair of convergent CTCF motifs using only the DNA sequences of the motifs and their flanking regions. Our results suggest that DeepCTCFLoop can accurately distinguish the convergent CTCF motif pairs forming chromatin loops from the ones not forming loops. It significantly outperforms CTCF-MP, a machine learning model based on word2vec and boosted trees, when using DNA sequences only. Moreover, we show that DNA motifs binding to ASCL1, SP2 and ZNF384 may facilitate the formation of chromatin loops in addition to convergent CTCF motifs. To our knowledge, this is the first published study of using deep learning techniques to discover the sequence motif patterns underlying CTCF-mediated chromatin loop formation. Our results provide useful information for understanding the mechanism of 3D genome organization. The source code and datasets used in this study for model construction are freely available at https://github.com/BioDataLearning/DeepCTCFLoop.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    36
    References
    0
    Citations
    NaN
    KQI
    []