language-icon Old Web
English
Sign In

Inter-rater reliability

In statistics, inter-rater reliability (also called by various similar names, such as inter-rater agreement, inter-rater concordance, inter-observer reliability, and so on) is the degree of agreement among raters. It is a score of how much homogeneity or consensus exists in the ratings given by various judges. In statistics, inter-rater reliability (also called by various similar names, such as inter-rater agreement, inter-rater concordance, inter-observer reliability, and so on) is the degree of agreement among raters. It is a score of how much homogeneity or consensus exists in the ratings given by various judges. In contrast, intra-rater reliability is a score of the consistency in ratings given by the same person across multiple instances. Inter-rater and intra-rater reliability are aspects of test validity. Assessments of them are useful in refining the tools given to human judges, for example, by determining if a particular scale is appropriate for measuring a particular variable. If various raters do not agree, either the scale is defective or the raters need to be re-trained. There are a number of statistics that can be used to determine inter-rater reliability. Different statistics are appropriate for different types of measurement. Some options are joint-probability of agreement, Cohen's kappa, Scott's pi and the related Fleiss' kappa, inter-rater correlation, concordance correlation coefficient, intra-class correlation, and Krippendorff's alpha. There are several operational definitions of 'inter-rater reliability', reflecting different viewpoints about what is a reliable agreement between raters. There are three operational definitions of agreement:

[ "Physical therapy", "Clinical psychology", "Social psychology", "Statistics", "Machine learning", "rater training", "Fleiss' kappa" ]
Parent Topic
Child Topic
    No Parent Topic