Invariance Of Scores From Student Evaluation Of Teaching Forms

Author :  

Year-Number: 2015-Volume 7, Issue 4
Language : null
Konu : null

Abstract

In the present study, comparability of scores from student evaluation of teaching forms were investigated. Checking comparability is of importance since scores given by students are used in decision making in higher education institutions. To this end, three course-related variables (grade level, course type and credit) were used to define student subgroups. Then, confirmatory factor analysis approach was used to assess invariance of factorial structure, factor loadings and means across groups. It was found that although a common factor structure hold across groups, invariant factor loadings were observed only across instructors who teach different course type, elective and must. For other groups, only partial invariance was obtained. Analyses also revealed that none of the subgroups had invariant factor means. Results indicate that comparison of instructors based on student ratings may not be reliable as much as it is assumed.

Keywords

Abstract

In the present study, comparability of scores from student evaluation of teaching forms were investigated. Checking comparability is of importance since scores given by students are used in decision making in higher education institutions. To this end, three course-related variables (grade level, course type and credit) were used to define student subgroups. Then, confirmatory factor analysis approach was used to assess invariance of factorial structure, factor loadings and means across groups. It was found that although a common factor structure hold across groups, invariant factor loadings were observed only across instructors who teach different course type, elective and must. For other groups, only partial invariance was obtained. Analyses also revealed that none of the subgroups had invariant factor means. Results indicate that comparison of instructors based on student ratings may not be reliable as much as it is assumed.

Keywords


  • Abrami, P.C. (2001). Improving judgments about teaching effectiveness using teacher rating forms. In M. Theall, P.C. Abrami, & L. A. Mets (Eds.), The student ratings debate: Are they valid? How can we best use them? [Specialissue]. New Directions for Institutional Research, 109, 59-87. doi: 10.1002/ir.4

  • Aleamoni, L., & Hexner, P. (1980). A review of the research on student evaluations. Instructional Science, 9, 67-84. doi: 10.1007/BF00118969

  • Baas, M., De Dreu, C. K. W., and Nijstad, B. A. (2011). When prevention promotes creativity: the role of mood, regulatory focus, and regulatory closure. Journal of Personality and Social Psychology, 100, 794– 809. doi: 10.1037/a0022981

  • Baird, J. S. (1987). Perceived learning in relation to student evaluation of university instruction. Journal of Educational Psychology, 79(1), 90-97. doi:10.1177/0273475308324086

  • Beran, T., & Violato, C. (2005). Ratings of university teacher instruction: How much do student and course characteristics really matter? Assessment & Evaluation in Higher Education, 30(6), 593-601. doi: 10.1080/02602930500260688

  • Braskamp, L.A., & Ory, J.C. (1994). Assessing faculty work: Enhancing individual and institutional performance. San Francisco: Jossey-Bass. doi: 10.1023/A:1007682101295

  • Bryant, F. B., & Satorra, A. (2012). Principles and practice of scaled difference chi-square testing. Structural Equation Modeling, 19(3), 372-398. doi: 10.1080/10705511.2012.687671

  • Byrne, B. M. (1989). Multi-group comparisons and the assumption of equivalent construct validity across groups: Methodological and substantive issues. Multivariate Behavioral Research, 24(4), 503–523. doi: 10.1207/s15327906mbr2404_7

  • Byrne, B. M. (2004). Testing for multigroup invariance using amos graphics: A road less traveled. Structural Equation Modeling, 11, 272–300. doi: 10.1207/s15328007sem1102_8

  • Cashin, W. E. (1990). Students do rate different academic fields differently. New Directions for Teaching and Learning, 43, 113–21. doi:10.1002/tl.37219904310

  • Chandrasekhar, A. J., Durazo-Arvizu, R., Hoyrt, A, & McNulty, J. A. (2013). Do student evaluations influence the teaching skills of clerkship clinical faculty? Educational Research and Evaluation: An International Journal on Theory and Practice, 19(7), 628-635. doi:10.1080/13803611.2013.834616

  • Chen, F. F., Sousa, K. H., & West, S. G. (2005). Testing measurement invariance of second-order factor models. Structural Equation Modeling, 12, 471–492. doi:10.1207/s15328007sem1203_7

  • Conran, P. B. (1991). High school student evaluation of student teachers? How do they compare with professionals. Illinnois School Research and Development, 27(2), 81-92.

  • Dimitrov, D.M. (2010). Testing for factorial invariance in the context of construct validation. Measurement and Evaluation in Counseling and Development, 43(2), 121-149. doi: 10.1177/0748175610373459

  • Donaldson, J. F., Flannery, D., & Ross-Gordon, J. (1993). A triangulated study comparing adult college students’ perceptions of effective teaching with those of traditional students. Continuing Higher Education Review, 57(3), 147–165.

  • Ehie, I. C. & Karathanos, D. (1994). Business faculty performance evaluation based on the new aacsb accreditation standards. Journal of Education for Business, 69(5), 257-26. doi:10.1080/08832323.1994.10117695

  • Emery, C. R., Kramer T. R., & Tian, R. G. (2003). Return to academic standards: a critique of student evaluations of teaching effectiveness. Quality Assurance in Education 11, 37–46. doi:10.1108/09684880310462074

  • Fan, X., & Sivo, S. (2005). Evaluating the sensitivity and generalizability of SEM fit indices while controlling for severity of model misspecification. Structural Equation Modeling, 12(3), 343-367. doi:10.1080/00273170701382864

  • Gigliotti, R. J., & Buchtel, F. S. (1990). Attributional bias and course evaluations. Journal of Educational Psychology, 82, 341-351. doi: 10.1037/0022-0663.82.2.341

  • Goldberg, G., & Callahan, J. (1991). Objectivity of student evaluations of instructors. Journal of Education for Business, 66(6), 377-378. doi: 10.1080/08832323.1991.10117505

  • Greenwald, A. G., & Gillmore, G. M. (1997) Grading leniency is a removable contaminant of student ratings. American Psychologist, 52, 1209-1217. doi: 10.1037/0003-066X.52.11.1209

  • Horn, J. L., & McArdle, J. J. (1992). A practical and theoretical guide to mi in aging research. Experimental Aging Research, 18, 117-144. doi: 10.1080/03610739208253916

  • Hu, L. T., & Bentler, P.M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives. Structural Equation Modeling, 6(1), 1-55. doi: 10.1080/10705519909540118

  • Jöreskog, K. G., & Sörbom, D. (2004). LISREL 8.7 for Windows [Computer software]. Skokie, IL: Scientific Software International, Inc.

  • Kalender, I. (2011). Contaminating factors in university students' evaluation of instructors. Education and Science, 36(162), 56-65. doi: 10.1080/10705519909540118

  • Kalender, I. (2014). Profiling instructional effectiveness to reveal its relationship to learning. The Asia-Pacific Education Researcher, 23(3), 717-726. doi: 10.1007/s40299-013-0145-2

  • Kline, R. B. (2005), Principles and practice of structural equation modeling (2nd Edition ed.). New York: The Guilford Press.

  • Kockelman, K. M. (2001). Student grades and course evaluations in engineering: What makes a difference? ASEE Annual Conference Proceedings, 9085–9110.

  • Kulik, J.A. (2001). Student ratings: Validity, utility, and controversy. In M. Theall, P.C. Abrami, and L.A. Mets (Eds.), The student ratings debate: Are they valid? How can we best use them? [Special issue]. New Directions for Institutional Research, 109, 9-25

  • Lemos, M. S., Queirós, C., Teixeira, P. M., Menezes, I. (2011). Development and validation of a theoretically based, multidimensional questionnaire of student evaluation of university teaching. Assessment & Evaluation in Higher Education, 36(7), 843-864. doi:10.1080/02602938.2010.493969

  • Little, T. D. (1997). Mean and covariance structures (MACS) analyses of cross-cultural data: Practical and theoretical issues. Multivariate Behavioral Research, 32, 53-76. doi:10.1207/s15327906mbr3201_3

  • Mann, H., Rutstein, D. W., & Hancock, G. (2007). The potential for differential findings among invariance testing strategies for multisample measured variable path models. Educational & Psychological Measurement, 69(4), 603–612. doi:: 10.1177/0013164408324470

  • Marsh, H. W. (2007). Students’ evaluations of university teaching: Dimensionality, reliability, validity, potential biases and usefulness. In R. Perry & J. Smart (Eds.), The scholarship of teaching and learning in higher education: An evidence-based perspective (pp. 319–383). Dordrecht, Netherlands: Springer.

  • Marsh, H. W., & Hocevar, D. (1991). The multidimensionality of students’ evaluations of teaching effectiveness: the generality of factor structures across academic discipline, instructor level, and course level. Teaching and Teacher Education, 7, 9–18. doi: 10.1016/0742-051X(91)90054-S

  • Marsh, H. W., Hau, K. T., & Wen, Z. (2004). In search of golden rules: Comment on hypothesis-testing approaches to setting cutoff values for fit indexes and dangers in overgeneralizing hu and bentler's findings. Structural Equation Modeling, 11(3), 320-341. doi: 10.1207/s15328007sem1103_2

  • Meredith, W., & Millsap, R.E. (1992). On the misuse of manifest variables in the detection of measurement bias. Psychometrika, 57, 289–311. doi: 10.1007/BF02294510

  • Milfont, T. L., & Fischer, R. (2010). Testing measurement invariance across groups: Applications in cross- cultural research. International Journal of Psychological Research, 3(1), 2011-2084. doi:

  • Nimmer, J. G., & Stone, E. F. (1991). Effects of grading practices and time of rating on student ratings of faculty performance and student learning, Research in Higher Education, 32(2), 195–215. doi: 10.1007/BF00974437

  • Norusis, M. (2004). SPSS 13.0 statistical procedures companion. Upper Saddle-River, NJ: Prentice Hall.

  • Sailor, P., Worthen, B. & Shin, E. H. (1997). Class level as a possible mediator of the relationship between grades and student ratings of teaching. Assessment & Evaluation in Higher Education, 22(3), 261–269. doi:

  • Satorra, A., & Bentler, P. M. (1988). Scaling corrections for chi-square statistics in covariance structure analysis. Proceedings of the American Statistical Association, 308-313.

  • Satorra, A., & Bentler, P. M. (2001). A scaled difference chi-square test statistic for moment structure analysis. Psychometrika, 66, 507–514. doi: 10.1007/BF02296192

  • Schermelleh-Engel, K., Moosbrugger, H., & Müller, H. (2003). Evaluating the fit of structural equation models: Test of significance and descriptive goodness-of-fit measures. Methods of Psychological Research - Online, 8(2), 23-74.

  • Scherr, F. C., & Scherr, S. S. (1990). Bias in student evaluations of teacher effectiveness, Journal of Education for Business, 65(8), 356–358.

  • Smith, R. A., & Cranton, P. A. (1992). Students perceptions of teaching skills and overall effectiveness across instructional settings. Research in Higher Education 33, 747–764. doi: 10.1007/BF00992056

  • Solomon, D. J., Speer, A J., Rosebraugh, C. J., & DiPette, D. J. (1997). The reliability of medical student ratings of clinical teaching. Evaluation & Health Professions, 20(3), 343–52.

  • Tenenbaum, A. B. (1977). Task-dependent effects of organization and context upon comprehension of prose. Journal of Educational Psychology, 69, 528–536. doi: 10.1037/0022-0663.69.5.528

  • Theall, M., & Franklin, J. (2001). Looking for bias in all the wrong places: A search for truth or a witch hunt in student ratings of instruction. In M. Theall, P. C. Abrame, & L. A. Mets (Eds.), New directions for institutional research (pp. 45–56). San Francisco: Jossey-Bass. doi: 10.1002/ir.3

  • Trick, L. R. (1993). Do grades affect faculty teaching evaluations? Journal of Optometric Evaluation, 18(3), 88-92.

  • Trivedi, S., Pardos, Z. A., & Heffernan N. T. (2011). Clustering students to generate an ensemble to improve standard test score predictions. In Proceedings of the 15th International Conference on Artificial Intelligence in Education, Auckland, New Zealand.

  • Vandenberg, R. J., & Lance, C. E. (2000). A review and synthesis of the mi literature: Suggestions, practices, and recommendations for organizational research. Organizational Research Methods, 3, 4-69. doi: 10.1177/109442810031002

  • Wilson, R. (1998). New research casts doubt on value of comparing adult college students’ perception of effective teaching with those of traditional students. Chronicle of Higher Education, 44(19), 12-14.

  • Wu. A. D., Li, Z., & Zumbo, B. D. (2007). Decoding the meaning of factorial invariance and updating the practice of multi-group confirmatory factor analysis: A demonstration with TIMSS data. Practical Assessment, Research and Evaluation, 12(3), 1-26.

  • Young, S. M., & Shaw, D. G. (1999). Profiles of effective college and university teachers. The Journal of Higher Education, 70(6), 670–686.

  • Zhao, J. & Gallant, D. J. (2012). Student evaluation of instruction in higher education: exploring issues of validity and reliability. Assessment & Evaluation in Higher Education, 37(2), 227-235. doi: 10.1080/02602938.2010.523819

  • Zumrawi, A. A., Bates, S. P., Schroeder, M. (2014). What response rates are needed to make reliable inferences from student evaluations of teaching?. Educational Research and Evaluation, 20(7-8), 557-563. doi: 10.1080/13803611.2014.997915

                                                                                                                                                                                                        
  • Article Statistics