Accurate identification of A-to-I RNA editing in human by transcriptome sequencing

2012 
RNA editing is a post-transcriptional process that alters the RNA sequences by base modifications, insertions, and deletions, thereby enhancing the diversity of gene products (for reviews, see Gott and Emeson 2000; Bass 2002; Maydanovych and Beal 2006; Farajollahi and Maas 2010; Nishikura 2010). The most prevalent type of known RNA editing in higher eukaryotes is A-to-I editing, where adenosine (A) residues are converted into inosine (I). The ADAR (adenosine deaminase acting on RNA) enzymes are the main players known to mediate A-to-I editing by binding to double-stranded RNAs (dsRNAs) ,which serve as the substrate for editing (Bass 2002; Nishikura 2010). However, target recognition by ADARs and the mechanisms of substrate interaction are not well understood. Since I is interpreted as guanosine during translation, A-to-I changes in protein-coding sequences may lead to codon changes and altered functional properties of the proteins (Maas 2010). In addition, A-to-I editing can play important roles in regulating gene expression (Maas 2010), such as by altering alternative splicing (Rueter et al. 1999; Laurencikiene et al. 2006; Schoft et al. 2007), miRNA sequences (Kawahara et al. 2007, 2008; Reid et al. 2008; Dupuis and Maas 2010), or miRNA target sites in the mRNA (Liang and Landweber 2007; Borchert et al. 2009). Other types of putative RNA editing events are also known, for example, C-to-U editing and U-to-C and G-to-A conversions (Nutt et al. 1994; Sharma et al. 1994; Villegas et al. 2002; Klimek-Tomczak et al. 2006), but with much less prevalence. To identify RNA editing sites on a genome-wide scale, new approaches were developed in recent years built upon bioinformatic analyses and high-throughput sequencing methods (Wulff et al. 2010). Bioinformatic methods were often used to identify disparities between DNA and RNA sequences (likely due to RNA editing) by analyzing cDNA, expressed sequence tag (EST), and genomic sequences (Athanasiadis et al. 2004; Kim et al. 2004; Levanon et al. 2004; Gommans et al. 2008; Zaranek et al. 2010). To reduce false positives due to sequencing errors or somatic mutations, it was often necessary to use a priori knowledge of editing patterns to restrain the search, such as the known feature of clustering of putative editing sites or the presence of dsRNA structure. However, incorporation of such constraints often limits the results to editing sites with the corresponding characteristics. Taking advantage of the recently available high-throughput sequencing technology, Li et al. (2009a) developed an approach to verify 36,000 editing-site candidates by designing padlock probes to amplify the corresponding cDNA and genomic DNA (gDNA) regions, followed by sequencing of the amplification products. Others also designed similar approaches where editing-site candidates were specifically amplified and sequenced (Wahlstedt et al. 2009; Abbas et al. 2010). The above approaches depend on a priori knowledge of editing-related features or candidate editing sites. Another desirable feature that is not afforded by some of the methods is the estimation of RNA editing levels. RNA editing levels (or editing ratios) represent the proportion of edited RNA molecules among all RNA molecules of a particular gene. Knowledge of editing levels can have profound biological significance. Recently, de novo identification of editing sites was made possible by whole-transcriptome sequencing (RNA-seq) (Picardi et al. 2010; Rosenberg et al. 2010; Ju et al. 2011; Li et al. 2011). Quantitative estimation of editing levels may be achieved by sequencing a large number of reads via high-throughput sequencing. In analyzing RNA-seq data, a significant challenge lies in the mapping of the sequencing reads. At an RNA editing site, some or all RNA-seq reads contain the nucleotide that is different from the one in the reference genome. Mapping of such reads via commonly used approaches can suffer from a bias favoring reads harboring the reference base, a similar problem as previously reported for read-mapping in the presence of expressed single nucleotide polymorphisms (SNPs) (Degner et al. 2009; Heap et al. 2009). Here, we developed new mapping and analysis strategies to study RNA editing based on RNA-seq. We show that this approach is associated with a false-discovery rate of ∼5%, much lower than those reported by previous methods (Wulff et al. 2010). In addition, our method allows relatively accurate estimation of editing levels that correlate well with those derived by the traditional clonal sequencing method. Enabled by the large number of events identified in our study, we conducted a detailed characterization of sequence, evolutionary, and structural features related to A-to-I editing, and revealed novel insights about potential regulatory mechanisms and functional roles of editing.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    63
    References
    261
    Citations
    NaN
    KQI
    []