Spearman's rank correlation coefficient

In statistics, Spearman's rank correlation coefficient or Spearman's rho, named after Charles Spearman and often denoted by the Greek letter ρ {displaystyle ho } (rho) or as r s {displaystyle r_{s}} , is a nonparametric measure of rank correlation (statistical dependence between the rankings of two variables). It assesses how well the relationship between two variables can be described using a monotonic function. In statistics, Spearman's rank correlation coefficient or Spearman's rho, named after Charles Spearman and often denoted by the Greek letter ρ {displaystyle ho } (rho) or as r s {displaystyle r_{s}} , is a nonparametric measure of rank correlation (statistical dependence between the rankings of two variables). It assesses how well the relationship between two variables can be described using a monotonic function. The Spearman correlation between two variables is equal to the Pearson correlation between the rank values of those two variables; while Pearson's correlation assesses linear relationships, Spearman's correlation assesses monotonic relationships (whether linear or not). If there are no repeated data values, a perfect Spearman correlation of +1 or −1 occurs when each of the variables is a perfect monotone function of the other. Intuitively, the Spearman correlation between two variables will be high when observations have a similar (or identical for a correlation of 1) rank (i.e. relative position label of the observations within the variable: 1st, 2nd, 3rd, etc.) between the two variables, and low when observations have a dissimilar (or fully opposed for a correlation of −1) rank between the two variables. Spearman's coefficient is appropriate for both continuous and discrete ordinal variables. Both Spearman's ρ {displaystyle ho } and Kendall's τ {displaystyle au } can be formulated as special cases of a more general correlation coefficient. The Spearman correlation coefficient is defined as the Pearson correlation coefficient between the rank variables. For a sample of size n, the n raw scores X i , Y i {displaystyle X_{i},Y_{i}} are converted to ranks rg ⁡ X i , rg ⁡ Y i {displaystyle operatorname {rg} X_{i},operatorname {rg} Y_{i}} , and r s {displaystyle r_{s}} is computed from: Only if all n ranks are distinct integers, it can be computed using the popular formula Identical values are usually each assigned fractional ranks equal to the average of their positions in the ascending order of the values, which is equivalent to averaging over all possible permutations. If ties are present in the data set, the simplified formula above yields incorrect results: Only if in both variables all ranks are distinct, then σ rg X σ rg Y = Var ⁡ rg X = Var ⁡ rg Y = ( n 2 − 1 ) / 12 {displaystyle sigma _{operatorname {rg} _{X}}sigma _{operatorname {rg} _{Y}}=operatorname {Var} {operatorname {rg} _{X}}=operatorname {Var} {operatorname {rg} _{Y}}=(n^{2}-1)/12} (Calculated according to biased variance.).The first equation — normalizing by the standard deviation — may be used even when ranks are normalized to ('relative ranks') because it is insensitive both to translation and linear scaling.

[ "Statistics", "Correlation", "Machine learning", "Correlation ratio", "Kendall's W", "Fisher transformation" ]
Parent Topic
Child Topic
    No Parent Topic