Comparison Of Various Methods Used İn Solving Missing Data Problems İn Terms Of Psychometric Features Of Scales And Measurement Results Under Different Missing Data Conditions

Author :  

Year-Number: 2015-Volume 7, Issue 4
Language : null
Konu : null

Abstract

In this research, five different methods used in solving missing data problems (listwise deletion, series mean, mean of nearby points, multiple imputation and regression imputation) were compared under the conditions of missing completely at random mechanism (MCAR), normal distribution, unidimentionality, different sample sizes (n=150; n=650) and different missing data rates (%5; %10; %20). The comparisons were made within the context of the psychometric features (eigenvalue; explained variance; Cronbach alpha) of the scale used in the study as a data collection tool and the measurement results (normal distribution, mean and standard deviation) obtained as a result of use of the scale. In line with the objective of the study, data were deleted from the complete data sets (n=150; n=650) at different rates (%5; %10; %20) and the missing data sets were transformed into new complete data sets with five different methods used in solving missing data problems. The values obtained from new complete data sets in terms of psychometric features and measurement results were compared with the ones obtained from the complete data sets; and inferences were made on which methods can be more applicable under which conditions compared to the others. For the comparisons, descriptive statistics were used for eigenvalues, explained variance and score distribution; Fisher’s z test was used for Cronbach alpha; t test for means and Levene’s test for equality of variances for standard deviation. The findings of the research reveal that, under all the conditions within the scope of the research, multiple imputation and regression imputation methods yielded values equal or the nearest to the ones obtained from complete data sets. However, listwise deletion method gave the farest values but the distances are at a negligible level. In comparions made with Fisher’s z test, t test and Levene’s test for equality of variances, no statistically significant differences were found among the values estimated for missing data methods and complete data sets. Therefore, it was concluded that there are no considerable differences among the methods in terms of applicability to solving the missing data problem.

Keywords

Abstract

In this research, five different methods used in solving missing data problems (listwise deletion, series mean, mean of nearby points, multiple imputation and regression imputation) were compared under the conditions of missing completely at random mechanism (MCAR), normal distribution, unidimentionality, different sample sizes (n=150; n=650) and different missing data rates (%5; %10; %20). The comparisons were made within the context of the psychometric features (eigenvalue; explained variance; Cronbach alpha) of the scale used in the study as a data collection tool and the measurement results (normal distribution, mean and standard deviation) obtained as a result of use of the scale. In line with the objective of the study, data were deleted from the complete data sets (n=150; n=650) at different rates (%5; %10; %20) and the missing data sets were transformed into new complete data sets with five different methods used in solving missing data problems. The values obtained from new complete data sets in terms of psychometric features and measurement results were compared with the ones obtained from the complete data sets; and inferences were made on which methods can be more applicable under which conditions compared to the others. For the comparisons, descriptive statistics were used for eigenvalues, explained variance and score distribution; Fisher’s z test was used for Cronbach alpha; t test for means and Levene’s test for equality of variances for standard deviation. The findings of the research reveal that, under all the conditions within the scope of the research, multiple imputation and regression imputation methods yielded values equal or the nearest to the ones obtained from complete data sets. However, listwise deletion method gave the farest values but the distances are at a negligible level. In comparions made with Fisher’s z test, t test and Levene’s test for equality of variances, no statistically significant differences were found among the values estimated for missing data methods and complete data sets. Therefore, it was concluded that there are no considerable differences among the methods in terms of applicability to solving the missing data problem.

Keywords


  • Acock, A.C. (2005). Working with missing values. Journal of Marriage and Family, 67, 1012-1028.

  • Akbaş, U. ve Tavşancıl, E. (2015). Farklı örneklem büyüklüklerinde ve kayıp veri örüntülerinde ölçeklerin psikometrik özelliklerinin kayıp veri baş etme teknikleri ile incelenmesi. Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi, 6 (1), 38-57.

  • Allison, P.D. (2003). Missing data techniques for structural equation modeling. Journal of Abnormal Psychology, 112 (4), 545-557, doi: 10.1037/0021-843X.112.4.545.

  • Allison, P.D. (2009). Missing data, London: Sage Publication.

  • Alosh, M. (2009). The impact of missing data in a generalized integer-valued autoregression model for count data. Journal of Biopharmaceutical Statistics, 19(6), 1039-1054, doi: 10.1080/10543400903242787.

  • Bal, C. (2003). Çok gruplu veri setlerinde eksik gözlem sorununun çözümlenmesi ve sağlık alanında bir uygulama. Yayımlanmamış doktora tezi, Eskişehir: Osmangazi Üniversitesi, Sağlık Bilimleri Enstitüsü.

  • Baraldi, A.N. ve Enders, C.K. (2010). An introduction to modern missing data analysis. Journal of School Psychology, 48, 5–37.

  • Buhi, E.R., Goodson, P. ve Neilands, T.B. (2008). Out of sight not out of mind: Strategies for handling missing data. American Journal of Health Behavior, 32 (1), 83-92.

  • Büyüköztürk, Ş. (2007). Sosyal bilim için veri analizi el kitabı Ankara: Pegem Akademi.

  • Carpita, M. ve Manisera, M. (2011). On the imputation of missing data in surveys with likert- type scales. Journal of Classification, 28, 93-112, doi: 10.1007/s00357-011-9074 z.

  • Cheema, J. (2012). Handling missing data in educational research using SPSS. Unpublished doctoral dissertation, George Mason University, USA.

  • Chen, S.F., Wang, S. ve Chen, Y.C. (2012). A simulation study using EFA and CFA programs based the impact of missing data on test dimensionality. Expert Systems with Applications, 39, 4026–4031.

  • Cumming, P. (2013). Missing data and multiple imputation. Clinical Review & Education, 167(7), 656-661.

  • Çokluk, Ö. ve Kayri, M. (2011). Kayıp değerlere yaklaşık değer atama yöntemlerinin ölçme araçlarının geçerlik ve güvenirliği üzerindeki etkisi. Kuram ve Uygulamada Eğitim Bilimleri, 11(1), 289-309.

  • Demir, E. (2013). Kayıp verilerin varlığında çoktan seçmeli testlerde madde ve test parametrelerinin kestirilmesi: SBS örneği. Eğitim Bilimleri Araştırmaları Dergisi, 3(2), 47-68.

  • Demir, E. ve Parlak, B. (2012). Türkiye’de eğitim araştırmalarında kayıp veri sorunu. Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi, 3(1), 230-241.

  • Downey, R.G. ve King, C.V. (1998). Missing data in likert ratings: A comparison of replacement methods. The Journal of General Psychology, 125(2), 175-191, doi: 10.1080/00221309809595542.

  • Duncan, T.E., Duncan, S.C. ve Li, F. (1998). A comparison of model ‐ and multiple imputation ‐ based approaches to longitudinal analyses with partial missingness. Structural Equation Modeling: A Multidisciplinary Journal, 5(1), 1-21, doi: 10.1080/10705519809540086.

  • Eminoğlu, E. ve Nartgün, Z. (2009). Üniversite öğrencilerinin akademik sahtekarlık eğilimlerinin ölçülmesine yönelik bir ölçek geliştirme çalışması. Uluslararası İnsan Bilimleri Dergisi, 6 (1), 215-240.

  • Enders, C.K. (2004). The impact of missing data on sample reliability estimates: Implications for reliability reporting practices. Educational and Psychological Measurement, 64(3), 419-436, doi: 10.1177/0013164403261050.

  • Enders, C.K. (2013). Dealing with missing data in developmental research. Child Development Perspectives, 7 (1), 27- 31.

  • Field, A. (2009). Discovering statistics using SPSS, London: Sage Publication.

  • Finch, H. ve Margraf, M. (2008). Imputation of categorical missing data: A Comparison of multivariate normal and multinomial methods. Retrived from http://www.mwsug.org/proceedings/2008/stats/MWSUG-2008S05.pdf on 05.08.2015.

  • Ginkel, J.R.V., Van der Ark, L.A., Sijtma, K. ve Vermunt, J.K. (2007). Two-way imputation: A Bayesian method for estimating missing scores in tests and questionnaires, and an accurate approximation. Computational Statistics & Data Analysis, 51, 4013-4027, doi:10.1016/j.csda.2006.12.022.

  • Ginkel, J.R.V., Sijtma, K., Van der Ark, L.A. ve Vermunt, J.K. (2010). Incidence of missing item scores in personality measurement, and simple item-score imputation. Methodology, 6(1), 17-30, doi: 10.1027/1614-2241/a000003.

  • Hawthorne, G. ve Elliot, P. (2005). Imputing cross-sectional missing data: comparison of common techniques. Australian and New Zealand Journal of Psychiatry, 39, 583-591, doi:10.1080/j.14401614.2005.01630.x.

  • Karal, Y. (2014). Cox regresyon yöntemi modelinde kayıp veri analiz yöntemleri. Yayımlanmamış yüksek lisans tezi, Samsun: Ondokuz Mayıs Üniversitesi, Fen Bilimleri Enstitüsü.

  • Karasar, N. (2007). Bilimsel araştırma yöntemi: kavramlar, ilkeler, teknikler Ankara: Nobel Yayın Dağıtım.

  • Köse, İ. A. ve Öztemur, B. (2014). Kayıp veri ele alma yöntemlerinin t-testi ve ANOVA parametreleri üzerine etkisinin incelenmesi. Abant İzzet Baysal Üniversitesi Eğitim Fakültesi Dergisi, 14(1), 400-412.

  • Leite, W. ve Beretvas, S.N. (2010). The performance of multiple imputation for likert-type items with missing data. Journal of Modern Applied Statistical Methods, (9)1, 64-74.

  • Little, R.J.A. (1988). Missing data adjustments in large surveys. Journal of Business & Economic Statistics, 6(3), 287-296.

  • McKnight, P.E., McKnight, K.M., Sidani, S. Ve Figueredo, A.J (2007). Missing data: A gentle introduction. United States of America: The Guilford Press.

  • Oğuzlar, A. (2001, Eylül). Alan araştirmalarinda kayip değer problemi ve çözüm önerileri. V. Ulusal Ekonometri ve İstatistik Sempozyumu’nda sunulan bildiri. Çukurova Üniversitesi, Adana.

  • Pigott, T.D. (2001). A review of methods for missing data. Educational Research and Evaluation, 7(4), 353-383.

  • Roth, P.L. (1994). Missing data: A conceptual review for applied psychologist. Personnel Psychology, 47(3), 37- 560.

  • Satıcı, E. ve Kadılar, C. (2009). Kayıp gözlem olduğunda kitle ortalamasının tahmini. Anadolu Üniversitesi Bilim ve Teknoloji Dergisi, 10(2), 549-556.

  • Şahin Kürşad, M. (2014). Sıklıkla kullanılan kayıp veri yöntemlerinin betimsel istatistik, güvenirlik ve geçerlik açısından karşılaştırılması. Yayımlanmamış yüksek lisans tezi, Bolu: Abant İzzet Baysal Üniversitesi, Eğitim Bilimleri Enstitüsü.

  • Van der Ark, L. A., ve Vermunt, J. K. (2010). New developments in missing data analysis. Methodology, 6(1), 1-2, doi: 10.1027/1614-2241/a000001.

  • Vansteelandt, S., Carpenter, J. ve Kenward, M.G. (2010). Analysis of incomplete data using inverse probability weighting and doubly robust estimators. Methodology, 6(1), 37-48. doi: 10.1027/16142241/a000005.

  • Yılmaz, H. (2014). Random forests yönteminde kayıp veri probleminin incelenmesi ve sağlık alanında bir uygulama. Yayımlanmamış yüksek lisans tezi, Eskişehir: Eskişehir Osmangazi Üniversitesi, Sağlık Bilimleri Enstitüsü.

  • Young, W., Weckman, G. ve Holland, W. (2011) A survey of methodologies for the treatment of missing values within datasets: limitations and benefits, Theoretical Issues in Ergonomics Science, 12(1), 15-43, doi: 10.1080/14639220903470205.

                                                                                                                                                                                                        
  • Article Statistics