language-icon Old Web
English
Sign In

Differential item functioning

Differential item functioning (DIF) is a statistical characteristic of an item that shows the extent to which the item might be measuring different abilities for members of separate subgroups. Average item scores for subgroups having the same overall score on the test are compared to determine whether the item is measuring in essentially the same way for all subgroups. The presence of DIF requires review and judgment, and it does not necessarily indicate the presence of bias. DIF analysis provides an indication of unexpected behavior of items on a test. An item does not display DIF if people from different groups have a different probability to give a certain response; it displays DIF if and only if people from different groups with the same underlying true ability have a different probability of giving a certain response. Common procedures for assessing DIF are Mantel-Haenszel, item response theory (IRT) based methods, and logistic regression. Differential item functioning (DIF) is a statistical characteristic of an item that shows the extent to which the item might be measuring different abilities for members of separate subgroups. Average item scores for subgroups having the same overall score on the test are compared to determine whether the item is measuring in essentially the same way for all subgroups. The presence of DIF requires review and judgment, and it does not necessarily indicate the presence of bias. DIF analysis provides an indication of unexpected behavior of items on a test. An item does not display DIF if people from different groups have a different probability to give a certain response; it displays DIF if and only if people from different groups with the same underlying true ability have a different probability of giving a certain response. Common procedures for assessing DIF are Mantel-Haenszel, item response theory (IRT) based methods, and logistic regression. DIF refers to differences in the functioning of items across groups, oftentimes demographic, which are matched on the latent trait or more generally the attribute being measured by the items or test. It is important to note that when examining items for DIF, the groups must be matched on the measured attribute, otherwise this may result in inaccurate detection of DIF. In order to create a general understanding of DIF or measurement bias, consider the following example offered by Osterlind and Everson (2009). In this case, Y refers to a response to a particular test item which is determined by the latent construct being measured. The latent construct of interest is referred to as theta (θ) where Y is an indicator of θ which can be arranged in terms of the probability distribution of Y on θ by the expression f(Y)|θ. Therefore, response Y is conditional on the latent trait (θ). Because DIF examines differences in the conditional probabilities of Y between groups, let us label the groups as the 'reference' and 'focal' groups. Although the designation does not matter, a typical practice in the literature is to designate the reference group as the group who is suspected to have an advantage while the focal group refers to the group anticipated to be disadvantaged by the test. Therefore, given the functional relationship f ( Y ) | θ {displaystyle f(Y)| heta } and under the assumption that there are identical measurement error distributions for the reference and focal groups it can be concluded that under the null hypothesis: with G corresponding to the grouping variable, 'r' the reference group, and 'f' the focal group. This equation represents an instance where DIF is not present. In this case, the absence of DIF is determined by the fact that the conditional probability distribution of Y is not dependent on group membership. To illustrate, consider an item with response options 0 and 1, where Y = 0 indicates an incorrect response, and Y = 1 indicates a correct response. The probability of correctly responding to an item is the same for members of either group. This indicates that there is no DIF or item bias because members of the reference and focal group with the same underlying ability or attribute have the same probability of responding correctly. Therefore, there is no bias or disadvantage for one group over the other. Consider the instance where the conditional probability of Y is not the same for the reference and focal groups. In other words, members of different groups with the same trait or ability level have unequal probability distributions on Y. Once controlling for θ, there is a clear dependency between group membership and performance on an item. For dichotomous items, this suggests that when the focal and reference groups are at the same location on θ, there is a different probability of getting a correct response or endorsing an item. Therefore, the group with the higher conditional probability of correctly responding to an item is the group advantaged by the test item. This suggests that the test item is biased and functions differently for the groups, therefore exhibits DIF.

[ "Item response theory", "Cochran–Mantel–Haenszel statistics", "item bias", "item fit", "mantel haenszel statistic" ]
Parent Topic
Child Topic
    No Parent Topic