Canonical correlation

In statistics, canonical-correlation analysis (CCA), also called canonical variates analysis, is a way of inferring information from cross-covariance matrices. If we have two vectors X = (X1, ..., Xn) and Y = (Y1, ..., Ym) of random variables, and there are correlations among the variables, then canonical-correlation analysis will find linear combinations of X and Y which have maximum correlation with each other. T. R. Knapp notes that 'virtually all of the commonly encountered parametric tests of significance can be treated as special cases of canonical-correlation analysis, which is the general procedure for investigating the relationships between two sets of variables.' The method was first introduced by Harold Hotelling in 1936, although in the context of angles between flats the mathematical concept was published by Jordan in 1875. In statistics, canonical-correlation analysis (CCA), also called canonical variates analysis, is a way of inferring information from cross-covariance matrices. If we have two vectors X = (X1, ..., Xn) and Y = (Y1, ..., Ym) of random variables, and there are correlations among the variables, then canonical-correlation analysis will find linear combinations of X and Y which have maximum correlation with each other. T. R. Knapp notes that 'virtually all of the commonly encountered parametric tests of significance can be treated as special cases of canonical-correlation analysis, which is the general procedure for investigating the relationships between two sets of variables.' The method was first introduced by Harold Hotelling in 1936, although in the context of angles between flats the mathematical concept was published by Jordan in 1875. Given two column vectors X = ( x 1 , … , x n ) ′ {displaystyle X=(x_{1},dots ,x_{n})'} and Y = ( y 1 , … , y m ) ′ {displaystyle Y=(y_{1},dots ,y_{m})'} of random variables with finite second moments, one may define the cross-covariance Σ X Y = cov ⁡ ( X , Y ) {displaystyle Sigma _{XY}=operatorname {cov} (X,Y)} to be the n × m {displaystyle n imes m} matrix whose ( i , j ) {displaystyle (i,j)} entry is the covariance cov ⁡ ( x i , y j ) {displaystyle operatorname {cov} (x_{i},y_{j})} . In practice, we would estimate the covariance matrix based on sampled data from X {displaystyle X} and Y {displaystyle Y} (i.e. from a pair of data matrices). Canonical-correlation analysis seeks vectors a {displaystyle a} ( a {displaystyle a} ∈ R n {displaystyle in mathbb {R} ^{n}} ) and b {displaystyle b} ( b ∈ R m {displaystyle bin mathbb {R} ^{m}} ) such that the random variables a T X {displaystyle a^{T}X} and b T Y {displaystyle b^{T}Y} maximize the correlation ρ = corr ⁡ ( a T X , b T Y ) {displaystyle ho =operatorname {corr} (a^{T}X,b^{T}Y)} . The random variables U = a T X {displaystyle U=a^{T}X} and V = b T Y {displaystyle V=b^{T}Y} are the first pair of canonical variables. Then one seeks vectors maximizing the same correlation subject to the constraint that they are to be uncorrelated with the first pair of canonical variables; this gives the second pair of canonical variables. This procedure may be continued up to min { m , n } {displaystyle min{m,n}} times. Let Σ X X = cov ⁡ ( X , X ) {displaystyle Sigma _{XX}=operatorname {cov} (X,X)} and Σ Y Y = cov ⁡ ( Y , Y ) {displaystyle Sigma _{YY}=operatorname {cov} (Y,Y)} . The parameter to maximize is The first step is to define a change of basis and define

Parent Topic

Child Topic

No Parent Topic