Sequence-level instructions direct transcription at polyT short tandem repeats.

2019 
Using the Cap Analysis of Gene Expression technology, the FANTOM5 consortium provided one of the most comprehensive maps of Transcription Start Sites (TSSs) in several species. Strikingly, ~72% of them could not be assigned to a specific gene and initiate at unconventional regions, outside promoters or enhancers. To determine whether these unconventional TSSs, sometimes referred to as 9transcriptional noise9 or 9junk9, are relevant nonetheless, we look for novel and conserved regulatory motifs located in their vicinity. We show that, in all species studied, a significant fraction of CAGE peaks initiate at short tandem repeats (STRs) corresponding to homopolymers of thymidines. Biochemical and genetic evidence further demonstrate that several of these CAGEs correspond to TSSs of mostly sense and intronic non-coding RNAs, whose transcription rate can be predicted with ~81% accuracy by a sequence-based deep learning model. Excitingly, our model further predicts that genetic variants linked to human diseases affect this STR-associated transcription. Together, our results extend the repertoire of non-coding transcription and provides a valuable resource for future studies of complex traits.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    82
    References
    3
    Citations
    NaN
    KQI
    []