Finite mixtures of matrix-variate Poisson-log normal distributions for three-way count data

2018 
Three-way data structures, characterized by three entities, the units, the variables and the occasions, are frequent in biological studies. In RNA sequencing, three-way data structures are obtained when high-throughput transcriptome sequencing data are collected for n genes across p conditions at r occasions. Matrix-variate distributions offer a natural way to model three-way data and mixtures of matrix-variate distributions can be used to cluster three-way data. Clustering of gene expression data is carried out as means to discovering gene co-expression networks. In this work, a mixture of matrix-variate Poisson-log normal distributions is proposed for clustering read counts from RNA sequencing. By considering the matrix-variate structure, full information on the conditions and occasions of the RNA sequencing dataset is simultaneously considered, and the number of covariance parameters to be estimated is reduced. A Markov chain Monte Carlo expectation-maximization algorithm is used for parameter estimation and information criteria are used for model selection. The models are applied to both real and simulated data, giving favourable clustering results.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    27
    References
    0
    Citations
    NaN
    KQI
    []