Using Simulated Retests to Estimate the Reliability of Diagnostic Assessment Systems
As diagnostic classification models become more widely used in large-scale operational assessments, we must give consideration to the methods for estimating and reporting reliability. Researchers must explore alternatives to traditional reliability methods that are consistent with the design, scoring, and reporting levels of diagnostic assessment systems. In this article, we describe and evaluate a method for simulating retests to summarize reliability evidence at multiple reporting levels. We evaluate how the performance of reliability estimates from simulated retests compares to other measures of classification consistency and accuracy for diagnostic assessments that have previously been described in the literature, but which limit the level at which reliability can be reported. Overall, the findings show that reliability estimates from simulated retests are an accurate measure of reliability and are consistent with other measures of reliability for diagnostic assessments. We then apply this method to real data from the Examination for the Certificate of Proficiency in English to demonstrate the method in practice and compare reliability estimates from observed data. Finally, we discuss implications for the field and possible next directions.
This is a pre-peer reviewed version of the following article: Thompson, W.J., Nash, B., Clark, A.K. and Hoover, J.C. (2023), Using Simulated Retests to Estimate the Reliability of Diagnostic Assessment Systems. Journal of Educational Measurement, which has been published in final form at http://doi.org/10.1111/jedm.12359. This article may be used for non-commercial purposes in accordance with Wiley Terms Conditions for Use of Self-Archived Versions.