Weighted correlation network analysis

Weighted correlation network analysis, also known as weighted gene co-expression network analysis (WGCNA), is a widely used data mining method especially for studying biological networks based on pairwise correlations between variables. While it can be applied to most high-dimensional data sets, it has been most widely used in genomic applications. It allows one to define modules (clusters), intramodular hubs, and network nodes with regard to module membership, to study the relationships between co-expression modules, and to compare the network topology of different networks (differential network analysis). WGCNA can be used as a data reduction technique (related to oblique factor analysis), as a clustering method (fuzzy clustering), as a feature selection method (e.g. as gene screening method), as a framework for integrating complementary (genomic) data (based on weighted correlations between quantitative variables), and as a data exploratory technique. Although WGCNA incorporates traditional data exploratory techniques, its intuitive network language and analysis framework transcend any standard analysis technique. Since it uses network methodology and is well suited for integrating complementary genomic data sets, it can be interpreted as systems biologic or systems genetic data analysis method. By selecting intramodular hubs in consensus modules, WGCNA also gives rise to network based meta analysis techniques. Weighted correlation network analysis, also known as weighted gene co-expression network analysis (WGCNA), is a widely used data mining method especially for studying biological networks based on pairwise correlations between variables. While it can be applied to most high-dimensional data sets, it has been most widely used in genomic applications. It allows one to define modules (clusters), intramodular hubs, and network nodes with regard to module membership, to study the relationships between co-expression modules, and to compare the network topology of different networks (differential network analysis). WGCNA can be used as a data reduction technique (related to oblique factor analysis), as a clustering method (fuzzy clustering), as a feature selection method (e.g. as gene screening method), as a framework for integrating complementary (genomic) data (based on weighted correlations between quantitative variables), and as a data exploratory technique. Although WGCNA incorporates traditional data exploratory techniques, its intuitive network language and analysis framework transcend any standard analysis technique. Since it uses network methodology and is well suited for integrating complementary genomic data sets, it can be interpreted as systems biologic or systems genetic data analysis method. By selecting intramodular hubs in consensus modules, WGCNA also gives rise to network based meta analysis techniques. The WGCNA method was developed by Steve Horvath, a professor of human genetics at the David Geffen School of Medicine at UCLA and of biostatistics at the UCLA Fielding School of Public Health and his colleagues at UCLA, and (former) lab members (in particular Peter Langfelder, Bin Zhang, Jun Dong). Much of the work arose from collaborations with applied researchers. In particular, weighted correlation networks were developed in joint discussions with cancer researchers Paul Mischel, Stanley F. Nelson, and neuroscientists Daniel H. Geschwind, Michael C. Oldham (according to the acknowledgement section in). There is a vast literature on dependency networks, scale free networks and coexpression networks. A weighted correlation network can be interpreted as special case of a weighted network, dependency network or correlation network. Weighted correlation network analysis can be attractive for the following reasons: First, one defines a gene co-expression similarity measure which is used to define the network. We denote the gene co-expression similarity measure of a pair of genes i and j by s i j {displaystyle s_{ij}} . Many co-expression studies use the absolute value of the correlation as an unsigned co-expression similarity measure, s i j u n s i g n e d = | c o r ( x i , x j ) | {displaystyle s_{ij}^{unsigned}=|cor(x_{i},x_{j})|} where gene expression profiles x i {displaystyle x_{i}} and x j {displaystyle x_{j}} consist of the expression of genes i and j across multiple samples. However, using the absolute value of the correlation may obfuscate biologically relevant information, since no distinction is made between gene repression and activation. In contrast, in signed networks the similarity between genes reflects the sign of the correlation of their expression profiles. To define a signed co-expression measure between gene expression profiles x i {displaystyle x_{i}} and x j {displaystyle x_{j}} , one can use a simple transformation of the correlation: s i j s i g n e d = 0.5 + 0.5 c o r ( x i , x j ) {displaystyle s_{ij}^{signed}=0.5+0.5cor(x_{i},x_{j})} As the unsigned measure s i j u n s i g n e d {displaystyle s_{ij}^{unsigned}} , the signed similarity s i j s i g n e d {displaystyle s_{ij}^{signed}} takes on a value between 0 and 1. Note that the unsigned similarity between two oppositely expressed genes ( c o r ( x i , x j ) = − 1 {displaystyle cor(x_{i},x_{j})=-1} ) equals 1 while it equals 0 for the signed similarity. Similarly, while the unsigned co-expression measure of two genes with zero correlation remains zero, the signed similarity equals 0.5. Next, an adjacency matrix (network), A = [ a i j ] {displaystyle A=} , is used to quantify how strongly genes are connected to one another. A {displaystyle A} is defined by thresholding the co-expression similarity matrix S = [ s i j ] {displaystyle S=} . 'Hard' thresholding (dichotomizing) the similarity measure S {displaystyle S} results in an unweighted gene co-expression network. Specifically an unweighted network adjacency is defined to be 1 if s i j > τ {displaystyle s_{ij}> au } and 0 otherwise.Because hard thresholding encodes gene connections in a binary fashion, it can be sensitive to the choice of the threshold and result in the loss of co-expression information. The continuous nature of the co-expression information can be preserved by employing soft thresholding, which results in a weighted network. Specifically, WGCNA uses the following power function assess their connection strength:

Parent Topic

Child Topic

No Parent Topic