The Conundrum of Kappa and Why Some Musculoskeletal Tests Appear Unreliable Despite High Agreement: A Comparison of Cohen Kappa and Gwet AC to Assess Observer Agreement When Using Nominal and Ordinal Data.

2021 
In clinical practice, physical therapists often use different kinds of tests and measures in the assessment of their patients. For therapists to have confidence when using their tests and measures, an important attribute is having intratester and intertester reliability. Studies that assess reliability are cases of observer agreement. Many studies have been performed assessing observer agreement in the physical therapy literature. The most commonly used method to assess observer agreement studies that use nominal or ordinal data is the statistical method suggested by Cohen and the corresponding reliability coefficient, Cohen kappa. Recently, Cohen kappa has undergone scrutiny because of what is called kappa paradox, which occurs when observer agreement is high but the resulting kappa value is low. Another paradox also occurs when asymmetries exist between raters on their disagreements, resulting in a higher kappa value. In the physical therapy literature, there are numerous examples of this problem, which can often lead to misunderstanding the meaning of the data. This Perspective examines how and why these problems occur and suggests an alternative method for assessing observer agreement.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    19
    References
    0
    Citations
    NaN
    KQI
    []