The meaning of Kappa: Probabilistic Concepts of Reliability and Validity Revisited

A framework-the “agreement concept”-is developed to study the use of Cohen’s kappa as well as alternative measures of chance-corrected agreement in a unified manner. Focusing on intrarater consistency it is demonstrated that for 2 X 2 tables an adequate choice between different measures of chancecorrected agreement can be made only if the characteristics of the observational setting are taken into account. In particular, a naive use of Cohen’s kappa may lead to strinkingly overoptimistic estimates of chancecorrected agreement. Such bias can be overcome by more elaborate study designs that allow for an unrestricted estimation of the probabilities at issue. When Cohen’s kappa is appropriately applied as a measure of chancecorrected agreement, its values prove to be a linear-and not a parabolic-function of true prevalence. It is further shown how the validity of ratings is influenced by lack of consistency. Depending on the design of a validity study, this may lead, on purely formal grounds, to prevalence-dependent estimates of sensitivity and specificity. Proposed formulas for “chance-corrected” validity indexes fail to adjust for this phenomenon.