Computational Prediction of G-Quadruplex Formation

2015 
Guanine-rich regions of genomic DNA can spontaneously fold into secondary structures called G-quadruplexes (GQs). Akin to tiny switches, GQs regulate genetic processes through their folding and unfolding. Their interest to basic science, as well as their potential as therapeutic targets for human diseases, has motivated the creation of computational tools for their prediction. Currently, GQ folding predictors are based on results from structural and biochemical studies of GQ formed in single-stranded (ss) DNA. As a result, existing tools perform poorly when applied to the prediction of GQ formation in double-stranded (ds) DNA, the native context within which genomic GQs are found. Here, we present a probabilistic model of GQ formation, which is learned from large-scale human genomic pull-down experiments and applied to the analysis of gene ontological data. Advances in the characterization of GQs in dsDNA have enabled us to integrate results from small-molecule binding assays and single-molecule FRET microscopy into our model. In order to obtain training sets of sequences, we identified nearly 700,000 unique, potential GQs and categorized them according to pull-down experiment outcomes. Model parameters learned from these training sets agree with experimental evidence and, when asked to predict the folding of dsDNA GQ sequences, outperformed existing models of GQ folding. To further explore the model's utility, we screened potential GQ drug targets by selecting high-scoring sequences for gene ontological analysis. Our results indicate that highly scoring sequences are preferentially located near cancer-related genes. This tool can be applied to genomic sequences to locate the most strongly forming GQs, revealing valuable information for the design of GQ-targeting therapies, and represents the next step toward the practical, widespread use of GQs in medicine and technology.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    0
    Citations
    NaN
    KQI
    []