Test validity

Test validity is the extent to which a test (such as a chemical, physical, or scholastic test) accurately measures what it is supposed to measure. In the fields of psychological testing and educational testing, 'validity refers to the degree to which evidence and theory support the interpretations of test scores entailed by proposed uses of tests'. Although classical models divided the concept into various 'validities' (such as content validity, criterion validity, and construct validity), the currently dominant view is that validity is a single unitary construct. Test validity is the extent to which a test (such as a chemical, physical, or scholastic test) accurately measures what it is supposed to measure. In the fields of psychological testing and educational testing, 'validity refers to the degree to which evidence and theory support the interpretations of test scores entailed by proposed uses of tests'. Although classical models divided the concept into various 'validities' (such as content validity, criterion validity, and construct validity), the currently dominant view is that validity is a single unitary construct. Validity is generally considered the most important issue in psychological and educational testing because it concerns the meaning placed on test results. Though many textbooks present validity as a static construct, various models of validity have evolved since the first published recommendations for constructing psychological and education tests. These models can be categorized into two primary groups: classical models, which include several types of validity, and modern models, which present validity as a single construct. The modern models reorganize classical 'validities' into either 'aspects' of validity or 'types' of validity-supporting evidence Test validity can itself be tested/validated using tests of inter-rater reliability, intra-rater reliability, repeatability (test-retest reliability), and other traits, usually via multiple runs of the test whose results are compared. Statistical analysis helps determine whether the differences between the various results either are large enough to be a problem or are acceptably small. Although psychologists and educators were aware of several facets of validity before World War II, their methods for establishing validity were commonly restricted to correlations of test scores with some known criterion. Under the direction of Lee Cronbach, the 1954 Technical Recommendations for Psychological Tests and Diagnostic Techniques attempted to clarify and broaden the scope of validity by dividing it into four parts: (a) concurrent validity, (b) predictive validity, (c) content validity, and (d) construct validity. Cronbach and Meehl's subsequent publication grouped predictive and concurrent validity into a 'criterion-orientation', which eventually became criterion validity. Over the next four decades, many theorists, including Cronbach himself, voiced their dissatisfaction with this three-in-one model of validity. Their arguments culminated in Samuel Messick's 1995 article that described validity as a single construct, composed of six 'aspects'. In his view, various inferences made from test scores may require different types of evidence, but not different validities. The 1999 Standards for Educational and Psychological Testing largely codified Messick's model. They describe five types of validity-supporting evidence that incorporate each of Messick's aspects, and make no mention of the classical models’ content, criterion, and construct validities. According to the 1999 Standards, validation is the process of gathering evidence to provide “a sound scientific basis” for interpreting the scores as proposed by the test developer and/or the test user. Validation therefore begins with a framework that defines the scope and aspects (in the case of multi-dimensional scales) of the proposed interpretation. The framework also includes a rational justification linking the interpretation to the test in question.

Parent Topic

Child Topic

No Parent Topic