Comparison of Item Response Theory Test Equating Methods for Mixed Format Tests

Author:

Year-Number: 2016-Volume 8, Issue 2

Abstract

This study aims to investigate the performance of test equating methods extended to mixed-format tests within the framework of Item Response Theory (IRT). To this end, a simulation study was conducted to compare equating errors of the mean/mean, mean/sigma, robust mean/sigma, Haebara, and Stocking-Lord methods under different conditions. Using 40-item tests, the effects of anchor length (10%, 20%, and 30%) and ability distribution (normal, negatively skewed, and positively skewed) were examined on a sample of 1000 participants. We used the common-item nonequivalent group design. The tests were developed using the three parameter logistic model for dichotomous simulated data and the generalized partial credit model for polytomous simulated data. The results of the study revealed that the robust mean/sigma method generally had the highest equating errors. When all conditions were evaluated, the least equating error occurred with the “Stocking-Lord” method in the case of positively skewed groups and a long anchor test (30%). Moreover, the results indicated that the groups with similar ability distributions (normal-normal, negatively skewed-negatively skewed, and positively skewed-positively skewed) produced less equation errors than the groups with different ability distributions (negatively skewed-normal, positively skewed-normal, and positively skewed-negatively skewed).

Keywords

Abstract

This study aims to investigate the performance of test equating methods extended to mixed-format tests within the framework of Item Response Theory (IRT). To this end, a simulation study was conducted to compare equating errors of the mean/mean, mean/sigma, robust mean/sigma, Haebara, and Stocking-Lord methods under different conditions. Using 40-item tests, the effects of anchor length (10%, 20%, and 30%) and ability distribution (normal, negatively skewed, and positively skewed) were examined on a sample of 1000 participants. We used the common-item nonequivalent group design. The tests were developed using the three parameter logistic model for dichotomous simulated data and the generalized partial credit model for polytomous simulated data. The results of the study revealed that the robust mean/sigma method generally had the highest equating errors. When all conditions were evaluated, the least equating error occurred with the “Stocking-Lord” method in the case of positively skewed groups and a long anchor test (30%). Moreover, the results indicated that the groups with similar ability distributions (normal-normal, negatively skewed-negatively skewed, and positively skewed-positively skewed) produced less equation errors than the groups with different ability distributions (negatively skewed-normal, positively skewed-normal, and positively skewed-negatively skewed).

Keywords