Joint estimation of isoform expression and isoform-specific read distribution using multisample RNA-Seq data

2014 
Motivation: RNA-sequencing technologies provide a powerful tool for expression analysis at gene and isoform level, but accurate estimation of isoform abundance is still a challenge. Standard assumption of uniform read intensity would yield biased estimates when the read intensity is in fact non-uniform. The problem is that, without strong assumptions, the read intensity pattern is not identifiable from data observed in a single sample. Results: We develop a joint statistical model that accounts for nonuniform isoform-specific read distribution and gene isoform expression estimation. The main challenge is in dealing with the large number of isoform-specific read distributions, which potentially are as many as the number of splice variants in the genome. A statistical regularization via a smoothing penalty is imposed to control the estimation. Also, for identifiability reasons, the method uses information across samples from the same region. We develop a fast and robust computational procedure based on the iterated-weighted least-squares algorithm, and apply it to simulated data and two real RNA-Seq datasets with RT-PCR validation. Empirical tests show that our model performs better than existing methods in terms of increasing precision in isoform-level estimation. Availability: We have implemented our method in an R package called Sequgio as a pipeline for fast processing of RNA-Seq data.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    28
    References
    20
    Citations
    NaN
    KQI
    []