language-icon Old Web
English
Sign In

Long non-coding RNA

Long non-coding RNAs (long ncRNAs, lncRNA) are a type of RNA, defined as being transcripts with lengths exceeding 200 nucleotides that are not translated into protein. This somewhat arbitrary limit distinguishes long ncRNAs from small non-coding RNAs such as microRNAs (miRNAs), small interfering RNAs (siRNAs), Piwi-interacting RNAs (piRNAs), small nucleolar RNAs (snoRNAs), and other short RNAs. Long intervening/intergenic noncoding RNAs (lincRNAs) are sequences of lncRNA which do not overlap protein-coding genes. Long non-coding RNAs (long ncRNAs, lncRNA) are a type of RNA, defined as being transcripts with lengths exceeding 200 nucleotides that are not translated into protein. This somewhat arbitrary limit distinguishes long ncRNAs from small non-coding RNAs such as microRNAs (miRNAs), small interfering RNAs (siRNAs), Piwi-interacting RNAs (piRNAs), small nucleolar RNAs (snoRNAs), and other short RNAs. Long intervening/intergenic noncoding RNAs (lincRNAs) are sequences of lncRNA which do not overlap protein-coding genes. In 2007 a study found only one-fifth of transcription across the human genome is associated with protein-coding genes, indicating at least four times more long non-coding than coding RNA sequences. However, it is large-scale complementary DNA (cDNA) sequencing projects such as FANTOM (Functional Annotation of Mammalian cDNA) that reveal the complexity of this transcription. The FANTOM3 project identified ~35,000 non-coding transcripts from ~10,000 distinct loci that bear many signatures of mRNAs, including 5’ capping, splicing, and poly-adenylation, but have little or no open reading frame (ORF). While the abundance of long ncRNAs was unanticipated, this number represents a conservative lower estimate, since it omitted many singleton transcripts and non-polyadenylated transcripts (tiling array data shows more than 40% of transcripts are non-polyadenylated). However, unambiguously identifying ncRNAs within these cDNA libraries is challenging since it can be difficult to distinguish protein-coding transcripts from non-coding transcripts. It has been suggested through multiple studies that testis, and neural tissues express the greatest amount of long non-coding RNAs of any tissue type. Using FANTOM5, 27,919 long ncRNAs have been identified in various human sources. Quantitatively, lncRNAs demonstrate ~10-fold lower abundance than mRNAs in a population of cells, which is explained by higher cell-to-cell variation of expression levels of lncRNA genes in the individual cells, when compared to protein-coding genes. In general, the majority (~78%) of lncRNAs are characterized as tissue-specific, as opposed by only ~19% of mRNAs. In addition to higher tissue specificity, lncRNAs are characterized by higher developmental stage specificity, and cell subtype specificity in heterogeneous tissues, such as human neocortex. In 2018, a comprehensive integration of lncRNAs from existing databases, published literatures and novel RNA assemblies based on RNA-seq data analysis, revealed that there are 270,044 lncRNA transcripts in human. Big efforts have been put into investigating lncRNAs in plant species, since they remain far more uninvestigated than in mammal species. An extensive study considering 37 higher plant species and six algae came out at the end of 2015 and identified ~200,000 non-coding transcripts using an in-silico approach. With this study it was created the Green Non-Coding Database (GreeNC), which is a repository of plant lncRNAs. In 2005 the landscape of the mammalian genome was described as numerous 'foci' of transcription that are separated by long stretches of intergenic space. While long ncRNAs are located and transcribed within the intergenic stretches, the majority are transcribed as complex, interlaced networks of overlapping sense and antisense transcripts that often include protein-coding genes, giving rise to a complex hierarchy of overlapping isoforms. Genomic sequences within these transcriptional foci are often shared within a number of different coding and non-coding transcripts in the sense and antisense directions For example, 3012 out of 8961 cDNAs previously annotated as truncated coding sequences within FANTOM2 were later designated as genuine ncRNA variants of protein-coding cDNAs. While the abundance and conservation of these interleaved arrangements suggest they have biological relevance, the complexity of these foci frustrates easy evaluation. The GENCODE consortium has collated and analysed a comprehensive set of human lncRNA annotations and their genomic organisation, modifications, cellular locations and tissue expression profiles. Their analysis indicates human lncRNAs show a bias toward two-exon transcripts. There has been considerable debate about whether lncRNAs have been misannotated and do in fact encode proteins. Several lncRNAs have been found to in fact encode for peptides with biologically significant function. Ribosome profiling studies have suggested that anywhere from 40% to 90% of annotated lncRNAs are in fact translated, although there is disagreement about the correct method for analyzing ribosome profiling data. Additionally, it is thought that many of the peptides produced by lncRNAs may be highly unstable and without biological function. Initial studies into lncRNA conservation noted that as a class, they were enriched for conserved sequence elements, depleted in substitution and insertion/deletion rates and depleted in rare frequency variants, indicative of purifying selection maintaining lncRNA function. However, further investigations into vertebrate lncRNAs revealed that while lncRNAs are conserved in sequence, they are not conserved in transcription. In other words, even when the sequence of a human lncRNA is conserved in another vertebrate species, there is often no transcription of a lncRNA in the orthologous genomic region. Some argue that these observations suggest non-functionality of the majority of lncRNAs, while others argue that they may be indicative of rapid species-specific adaptive selection. While the turnover of lncRNA transcription is much higher than initially expected, it is important to note that still, hundreds of lncRNAs are conserved at the sequence level. There have been several attempts to delineate the different categories of selection signatures seen amongst lncRNAs including: lncRNAs with strong sequence conservation across the entire length of the gene, lncRNAs in which only a portion of the transcript (e.g. 5' end, splice sites) is conserved, and lncRNAs that are transcribed from syntenic regions of the genome but have no recognizable sequence similarity. Additionally, there have been attempts to identify conserved secondary structures in lncRNAs, though these studies have currently given way to conflicting results.

[ "RNA", "Cancer", "Downregulation and upregulation", "X (Inactive)-Specific Transcript", "GAS5", "LncRNA-MIAT", "Paraspeckle", "HOX Transcript Antisense RNA" ]
Parent Topic
Child Topic
    No Parent Topic