Finite mixtures of matrix-variate Poisson-log normal distributions for three-way count data
2018
Three-way data structures, characterized by three entities, the units, the variables and the occasions, are frequent in biological studies. In RNA sequencing, three-way data structures are obtained when high-throughput transcriptome sequencing data are collected for n genes across p conditions at r occasions. Matrix-variate distributions offer a natural way to model three-way data and mixtures of matrix-variate distributions can be used to cluster three-way data. Clustering of gene expression data is carried out as means to discovering gene co-expression networks. In this work, a mixture of matrix-variate Poisson-log normal distributions is proposed for clustering read counts from RNA sequencing. By considering the matrix-variate structure, full information on the conditions and occasions of the RNA sequencing dataset is simultaneously considered, and the number of covariance parameters to be estimated is reduced. A Markov chain Monte Carlo expectation-maximization algorithm is used for parameter estimation and information criteria are used for model selection. The models are applied to both real and simulated data, giving favourable clustering results.
Keywords:
- Correction
- Source
- Cite
- Save
- Machine Reading By IdeaReader
27
References
0
Citations
NaN
KQI