Coefficient of determination

In statistics, the coefficient of determination, denoted R2 or r2 and pronounced 'R squared', is the proportion of the variance in the dependent variable that is predictable from the independent variable(s). In statistics, the coefficient of determination, denoted R2 or r2 and pronounced 'R squared', is the proportion of the variance in the dependent variable that is predictable from the independent variable(s). It is a statistic used in the context of statistical models whose main purpose is either the prediction of future outcomes or the testing of hypotheses, on the basis of other related information. It provides a measure of how well observed outcomes are replicated by the model, based on the proportion of total variation of outcomes explained by the model. There are several definitions of R2 that are only sometimes equivalent. One class of such cases includes that of simple linear regression where r2 is used instead of R2. When an intercept is included, then r2 is simply the square of the sample correlation coefficient (i.e., r) between the observed outcomes and the observed predictor values. If additional regressors are included, R2 is the square of the coefficient of multiple correlation. In both such cases, the coefficient of determination normally ranges from 0 to 1. There are cases where the computational definition of R2 can yield negative values, depending on the definition used. This can arise when the predictions that are being compared to the corresponding outcomes have not been derived from a model-fitting procedure using those data. Even if a model-fitting procedure has been used, R2 may still be negative, for example when linear regression is conducted without including an intercept, or when a non-linear function is used to fit the data. In cases where negative values arise, the mean of the data provides a better fit to the outcomes than do the fitted function values, according to this particular criterion. Since the most general definition of the coefficient of determination is also known as the Nash–Sutcliffe model efficiency coefficient, this last notation is preferred in many fields, because denoting a goodness-of-fit indicator that can vary from -∞ to 1 (i.e., it can yield negative values) with a squared letter is confusing. When evaluating the goodness-of-fit of simulated (Ypred) vs. measured (Yobs) values, it is not appropriate to base this on the R2 of the linear regression (i.e., Yobs= m·Ypred + b). The R2 quantifies the degree of any linear correlation between Yobs and Ypred, while for the goodness-of-fit evaluation only one specific linear correlation should be taken into consideration: Yobs = 1·Ypred + 0 (i.e., the 1:1 line). A data set has n values marked y1,...,yn (collectively known as yi or as a vector y = T), each associated with a fitted (or modeled, or predicted) value f1,...,fn (known as fi, or sometimes ŷi, as a vector f). Define the residuals as ei = yi − fi (forming a vector e). If y ¯ {displaystyle {ar {y}}} is the mean of the observed data: then the variability of the data set can be measured using three sums of squares formulas:

Parent Topic

Child Topic

No Parent Topic