Joint estimation of isoform expression and isoform-specific read distribution using multisample RNA-Seq data

Chen Suo,Stefano Calza,Agus Salim,Yudi Pawitan

Joint estimation of isoform expression and isoform-specific read distribution using multisample RNA-Seq data

2014

Motivation: RNA-sequencing technologies provide a powerful tool for expression analysis at gene and isoform level, but accurate estimation of isoform abundance is still a challenge. Standard assumption of uniform read intensity would yield biased estimates when the read intensity is in fact non-uniform. The problem is that, without strong assumptions, the read intensity pattern is not identifiable from data observed in a single sample. Results: We develop a joint statistical model that accounts for nonuniform isoform-specific read distribution and gene isoform expression estimation. The main challenge is in dealing with the large number of isoform-specific read distributions, which potentially are as many as the number of splice variants in the genome. A statistical regularization via a smoothing penalty is imposed to control the estimation. Also, for identifiability reasons, the method uses information across samples from the same region. We develop a fast and robust computational procedure based on the iterated-weighted least-squares algorithm, and apply it to simulated data and two real RNA-Seq datasets with RT-PCR validation. Empirical tests show that our model performs better than existing methods in terms of increasing precision in isoform-level estimation. Availability: We have implemented our method in an R package called Sequgio as a pipeline for fast processing of RNA-Seq data.

Keywords:

Correction
Source
Cite
Save
Machine Reading By IdeaReader

References

Citations