A Comparison of Computerized Classification Testing Methods over Different Size Item Pools

Author :  

Year-Number: 2022-Volume 14, Issue 3
Yayımlanma Tarihi: 2022-08-19 10:58:28.0
Language : İngilizce
Konu : Eğitim Bilimleri
Number of pages: 730-746
Mendeley EndNote Alıntı Yap

Abstract

Keywords

Abstract

The aim of this paper, which is comprised of two studies, is to examine different classification criteria, item selection methods and ability estimation methods used in computerized classification testing applications in terms of average classification accuracy (ACA), average test length (ATL), and measurement precision under the constraint conditions with and without content balancing and item exposure control. In the first study, 48 simulation conditions for 1000 examinees and item pools with 300, 600 and 900 items were created. In the second one, 16 conditions were created by making use of 822 examinees’ responses to a real paper and pencil test with eighty items. In the first study, it was found that there were similar and higher values in terms of ACA, while the expected a posteriori (EAP) estimator had a slight advantage over the sequential probability ratio test (SPRT) in terms of ATL. Further, the extension of the item pool increased the test efficiency. Upon an examination of content balancing and item exposure control, test efficiency was more adversely affected when one of the methods from SPRT, the cutscore based maximum Fisher information (MFI-CB), and EAP or the pool with 300 items was used. On the other hand, in the second study it was found that test efficiency for particularly confidence interval CI: 98% was higher when the ability estimator was the weighted likelihood estimation (WLE). Moreover, the common finding by two studies was that SPRT was more useful in maximazing the classification accuracy compared to confidence interval classification criterion.

Keywords


  • Bao, Y., Shen, Y., Wang, S., & Bradshaw, L. (2021). Flexible computerized adaptive tests to detect misconceptions and estimate ability simultaneously. Applied Psychological Measurement, 45(1), 3-21. https://doi.org/10.1177/0146621620965730

  • Creswell, J. W. (1994). Research design: Qualitative and quantitative approaches. Thousand Oaks. Sage.

  • Dooley, K. (2002). Simulation research methods. In J. Baum (Ed.), Companion to organizations (pp. 829-848). Blackwell.

  • Eggen, T. J. H. M. (1999). Item selection in adaptive testing with the sequential probability ratio test. Applied Psychological Measurement, 23(3), 249-261. https://doi.org/10.1177/01466219922031365

  • Eggen, T. J. H. M., & Straetmans, G. J. J. M. (2000). Computerized adaptive testing for classifying examinees into three categories. Educational and Psychological Measurement, 60(5), 713-734. https://doi.org/10.1177/00131640021970862

  • Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologist. Lawrence Erlbaum Associates Publishers.

  • Fan, Z., Wang, C., Chang, H., & Douglas, J. (2012). Utilizing response time distributions for item selection in CAT. Journal of Educational and Behavioral Statistics, 37(5), 655-670.

  • Fraenkel, J. R., Wallen, N. E., & Hyun, H. H. (2012). How to design and evaluate research in education (8th ed.). McGraw-Hill.

  • Finkelman, M. (2008). On using stochastic curtailment to shorten the SPRT in sequential mastery testing. Journal of Educational and Behavioral Statistics, 33(4), 442-463. https://doi.org/10.3102/1076998607308573

  • Hambleton, R. K., & Swaminathan, H. (1985). Item response theory: Principles and applications. Kluwer Nijhoff Publishing.

  • Huebner, A. (2012). Item overexposure in computerized classification tests using sequential item selection. Practical Assessment, Research & Evaluation, 17(12), 1-9. https://doi.org/10.7275/nr1c-yv82

  • Huebner, A., & Li, Z. (2012). A stochastic method for balancing item exposure rates in computerized classification tests. Applied Psychological Measurement, 36(3), 181-188.

  • Kingsbury, G. G., & Weiss, D. J. (1980). A Comparison of adaptive, sequential and conventional testing strategies for mastery decisions (Research Report 80-4). University of Minnesota, Minneapolis: MN. http://iacat.org/sites/default/files/biblio/ki80-04.pdf

  • Kingsbury, G. G., & Weiss, D.J. (1983). A comparison of IRT-based adaptive mastery testing and a sequential mastery testing procedure. In D. J. Weiss (Ed.), New horizons in testing: Latent trait theory and computerized adaptive testing, (pp. 237-254). Academic Press.

  • Kingsbury, G. G., & Zara, A.R. (1989). Procedures for selecting items for computerized adaptive tests. Applied Measurement in Education, 2(4), 359-375. https://doi.org/10.1207/s15324818ame0204_6

  • Lau, C. A., & Wang, T. (1999, April). Computerized classification testing under practical constraints with a polytomous model. Paper presented at the annual meeting of the American Educational Research Association (AERA), Montreal, Canada. http://iacat.org/sites/default/files/biblio/la99-01.pdf

  • Leroux, A. J., Waid-Ebbs, J. K., Wen, P-S., Helmer, D. A., Graham, D. P., O’Connor, M. K, & Ray, K. (2019). An investigation of exposure control methods with variable-length cat using the partial credit model. Applied Psychological Measurement, 43(8), 624-638. https://doi.org/10.1177/0146621618824856

  • Leung, C.-K., Chang, H. H., & Hau, K. T. (2002). Item selection in computerized adaptive testing: Improving the a-stratified design with the Sympson–Hetter algorithm. Applied Psychological Measurement, 26(4), 376-392. https://doi.org/10.1177/014662102237795

  • Lin, C. (2011). Item selection criteria with practical constraints for computerized classification testing. Applied Psychological Measurement 71(1), 20-36. https://doi.org/10.1177/0013164410387336

  • Lin, C. J., & Spray, J. (2000). Effects of item-selection criteria on classification testing with the sequential probability ratio test. ACT (Research Report 2000-8). Iowa city, IA: ACT Research Report Series. https://eric.ed.gov/?id=ED445066

  • Miller, I., & Miller, M. (2004). John E. Freund’s mathematical statistics with applications. (7th ed.). Prentice Hall.

  • Nydick, S. W., Nozawa, Y., & Zhu, R. (2012, April). Accuracy and efficiency in classifying examinees using computerized adaptive tests: An application to a large-scale test. Paper presented at the annual meeting of the National Council on Measurement in Education (NCME), Vancouver, British Columbia, Canada. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.476.3381&rep=rep1&type=pdf

  • Reckase, M. D. (1983). A procedure for decision making using tailored testing. In D. J. Weiss (Ed.), New horizons in testing: latent trait theory and computerized adaptive testing, (pp. 237-254). Academic Press.

  • Spray, J. A., & Reckase, M. D. (1996). Comparison of SPRT and sequential bayes procedures for classifying examinees into two categories using a computerized test. Journal of Educational and Behavioral Statistics, 21(4), 405-414. https://doi.org/10.3102/10769986021004405

  • Sympson, J. B., & Hetter, R. D. (1985, October). Controlling item exposure rates in computerized adaptive testing. In Proceedings of the 27th annual meeting of the Military Testing Association (pp. 937-977). San Diego, CA: Navy Personnel Research and Development Center. http://www.iacat.org/content/controllingitem-exposure-rates-computerized-adaptive-testing

  • Thompson, N. A. (2007). A practitioner’s guide for variable-length computerized classification testing. Practical Assessment Research & Evaluation, 12(1), 1-13. http://www.iacat.org/sites/default/files/biblio/th07-01.pdf

  • Thompson, N. A. (2009). Item selection in computerized classification testing. Educational and Psychological Measurement, 69(5), 778-793. https://doi.org/10.1177/0013164408324460

  • Thompson, N. A. (2011). Termination criteria for computerized classification testing. Practical Assessment, Research & Evaluation, 16(4), 1-7. https://pareonline.net/getvn.asp?v=16&n=4

  • Thompson, N. A., & Ro, S. (2007). Computerized classification testing with composite hypotheses. In D. J. Weiss (Ed.). Proceedings of the 2007 GMAC conference on computerized adaptive testing. http://www.iacat.org/sites/default/files/biblio/cat07nthompson.pdf

  • Van der Linden, W. J., & Hambleton, R. K. (1996). Handbook of item response theory: Models, statistical tools, and applications. CRC Press.

  • Van der Linden, W. J., & Veldkamp, B. P. (2004). Constraining item exposure in computerized adaptive testing with shadow tests. Journal of Educational and Behavioral Statistics, 29(3), 273-291.

  • Wainer, H., & Thissen, D. (1987). Estimating ability with the wrong model. Journal of Educational Statistics, 12(4), 339–368. https://doi.org/10.2307/1165054

  • Wald, A. (1947). Sequential analysis. John Wiley.

  • Wang, T., & Vispoel, W. P. (1998). Properties of ability estimation methods in computerized adaptive testing. Journal of Educational Measurement, 35(2), 109-135. https://www.jstor.org/stable/1435235

  • Wang, T., Hanson, B. A. & Lau, C. A. (1999). Reducing bias in CAT trait estimation: A comparison of approaches. Applied Psychological Measurement, 23(3), 263-278.

  • Wang, S., & Wang, T. (2001). Precision of warm’s weighted likelihood estimates for a polytomous model in computerized adaptive testing. Applied Psychological Measurement, 25(4), 317–331.

  • Warm, T. A. (1989). Weighted likelihood estimation of ability in item response theory. Psychometrika, 54(3), 427-450. https://doi.org/10.1007/BF02294627

  • Weiss, D. J., & Kingsbury, G. G. (1984). Application of computerized adaptive testing to educational problems. Journal of Educational Measurement, 21(4), 361-375. https://doi.org/10.1111/j.1745-3984.1984.tb01040.x

  • Yang, X., Poggio, J. C., & Glasnapp, D. R. (2006). Effects of estimation bias on multiple category classification with an IRT-based adaptive classification procedure. Educational and Psychological Measurement, 66(4), 545-564. https://doi.org/10.1177/0013164405284031

  • Yi, Q., Wang, T., & Ban, J. (2001). Effects of scale transformation and test-termination rule on the precision of ability estimation in computerized adaptive testing. Journal of Educational Measurement, 38(3), 267-292. https://doi.org/10.1111/j.1745-3984.2001.tb01127.x/full

                                                                                                                                                                                                        
  • Article Statistics