The purpose of this study was to examine the differential item functioning (DIF) of verbal ability test items by gender (male vs female) and country (Oman vs the rest of the Gulf countries) using the Mantel-Haenszel (MH) and the Likelihood Ratio Test (LRT) methods which will be reflected on the accuracy of the test results. The sample was 2688 students in grades five and six and to achieve the study's objectives, MH was applied using the SPSS program and LRT using the BILOG-MG program. The classification stability coefficient kappa (?) used to know the agreement ratio between the two methods was calculated to detect differential performance. The results using MH showed that 16.7% of items exhibited DIF in relation to gender, and 33.3% regarding country. Additionally, results showed that DIF utilizing LRT was evident for 10% of the items with respect to gender and 30% of to the country. The agreement between the MH approach and LRT for gender was quite high (? = 0.725). The agreement between the MH approach and LRT for the country was also quite high (? = 0.655). The study recommended further study to investigate of the causes of the differential functioning of some items of the verbal ability test.
Abu Shindi, Y. A., & Kazem, A. M. (2018). Sex differential item functioning for Mathematics test in cognitive development program in Sultanate of Oman by Mental-Haenszel and item characteristic curve methods. Int. J. Learn. Man. Sys, 6(2), 61-73.
Allabadi, N. (2008). A comparison between four methods for detecting item function (Assimilation Study). Unpublished doctoral dissertation. Jordan's University.
Almehrizi, R. S. (2010). Comparing among new residual-fit and wright’s Indices for dichotomous three -Parameter IRT model with standardized tests. Journal of Educational & Psychological Studies, 4 (2), 14-26.
Almaskari, H. A., Almehrizi, R. S., & Hassan, A. S. (2021). Differential item functioning of verbal ability test in Gulf multiple mental ability scale for GCC students according to gender and country. Journal of Educational and Psychological Studies, 15 (1). 120- 137.
Alodat, A. M., & Zumberg, M. F. (2018). Standardizing the cognitive abilities screening test (CogAt 7) for identifying gifted and talented children in kindergarten and elementary schools in Jordan. Education of Gifted Young Scientists, 6(2), 1-13. http://doi.org/10.17478/JEGYS.2018.73
Alsawalmeh, Y., Al Ajlouni, J. (2019). The relationship between the differential distractors functioning and the differential item functioning in a multiple-choice mathematics test. Jordanian Journal of Educational Sciences, 15(1), 49- 63.
Alquraan, M., & Alkuwaiti, A. (2017). Differential item functioning in students rating of teaching effectiveness surveys in higher education according to academic disciplines: Data from a Saudi University. Journal of Educational and Psychological Studies [JEPS], 11(4), 770-780.
Alzayat, F., Almehrizi, R., Arshad, A., Fathi, K., Albaili, M., dogan, A., Asiri, A., Hadi, F., & Jassim, A. (2011). Technical report of the Gulf scale for multiple mental abilities (GMMAS). Arab Gulf University, Bahrain.
American Educational Research Association. (2014). Standards for educational and psychological testing. American Educational Research Association American Psychological Association National Council on Measurement in Education.
Aryadoust, V. (2018). Using recursive partitioning Rasch trees to investigate differential item functioning in second language reading tests. Studies in Educational Evaluation, 56, 197- 204. https://doi.org/10.1016/j.stueduc.2018.01.003
Ayala, R. J. (2009). The theory and practice of item response theory. The Guilford press, New York.
Bichi, E., Embong, R., Talib, R., Salleh, S., & Ibrahim, A. (2019). Comparative analysis of classical test theory and item response theory using Chemistry test data. International Journal of Engineering and Advanced Technology 8(5), 1260- 1266. https:/dOI: 10.35940/ijeat. E1179.0585C19.
Eleje, L., Onah, F., & Abanobi, C. (2018). Comparative study of classical test theory and item response theory using diagnostic quantitative economics skill test item analysis results. European Journal of Educational & Social Sciences 3(1), 71-89.
Geramipour, M. (2020). Item-focused trees approach in differential item functioning (DIF) analysis: a case study of an EFL reading comprehension test. Journal of Modern Research in English Language Studies, 7(2), 123-147. https:/doi:
10.30479/jmrels.2019.11061.1379
Geramipour, M., & Shahmirzadi, N. (2019). A gender–related differential item functioning study of an English test. Journal of Asia TEFL, 16(2), 674.
Giray, B. Yildirim, H. (2007). The DIF analyses of PISA2003 mathematics items via likelihood ratio, Mantel-Haenszel and restricted factor analysis procedures. Report, Retrieved in Jan 4 ,2010, from http:// www. Etd.lib.metu.edu.tr.
Gomez-Benito, J., Sireci, S., Padilla, J.-L., Hidalgo, M. D., & Benitez, I. (2018). Differential item functioning: Beyond validity evidence based on internal structure. Psicothema, 30(1), 104–109.
Hammad, D. (2021). Detecting gender-related differential item functioning in Raven standard progressive matrices and its effect on Saudi sample's cognitive responses. Educational and psychological studies, 36 (111), 1-35.
Jabrayilov, R., Emons, W., & Sijtsma, K. (2016). Comparison of classical test theory and item response theory in individual change assessment. Applied Psychological Measurement, 40, 1- 14. https:/doi:10.1177/0146621616664046.
Kiany, G. R., & Jalali, S. (2009). Theoretical and practical comparison of classical test theory and item response theory. Iranian Journal of Applied Linguistics, 12(1), 167-197. https://www.sid.ir/en/journal/ViewPaper.aspx?id=247630
Kim, S., Cohen, A., & Lin Y. (2005). LDID: A Computer program for local dependence indices for dichotomous Items. Version 1.0.
Krabbe, P. F. (2017). The measurement of health and health status: Concepts, methods and applications from a multidisciplinary perspective. Elsevier.
Landis, J. R., Koch, G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33(1), 159–174. doi:10.2307/2529310
Liu, Q. (2011). Item purification in differential item functioning using generalized linear mixed models. Unpublished doctoral dissertation. Florida State University Libraries.
Magis, D., Yan, D., & Von Davier, A. A. (2017). Computerized adaptive and multistage testing with R: Using packages catR and mstR. Springer.
Mubarak, W. (2006). Differential item functioning for science test in (PISA) 2006 international study. An unpublished doctoral thesis. Yarmouk University.
Nawafleh, A. (2017). The Effect of paragraphs with differential functioning of uniform on estimating paragraphs parameters and persons using a stimulated data according to the Three parameters model. Educational science studies, 44(4), 187- 207.
Oalla, B., Matarneh, A. (2018). Differential performance of the items of the University level Test for English language among the students of Mutah University. Journal of Educational and Psychological Sciences, 19(2), 449- 475.
Ojerinde, D. (2013). Classical Test Theory (CTT) VS Item Response Theory (IRT): An Evaluation of the comparability of item analysis result. Lecture presentation at the institute of education.
Rashwan, R. (2021). Differential item function and its impact on the differential test function using item response theory models and multiple group confirmatory factor analysis. Journal of Educational Sciences and Human Studies, 6(15), 44-93.
Sayed, M., Bakhoum, R., Moussa, M., & Mohamed, M. (2022). Detecting the differential item function of gender on the emotional balance scale using mantel Hansel method According to the assumptions of the item response theory. Journal of Research in Education and Psychology, 37(1), 361- 396.
Shanmugam, S. (2020). Gender related differential item functioning of mathematics computation items among non-native speakers of English. The Mathematics Enthusiast, 17(1), 108-140. https://doi.org/10.54870/1551-3440.1482
Smith, R. (2011). Investigating the relationship between cognitive ability and academic achievement in elementary reading and mathematics. Retrieved from http://chalkboardproject.org
Spiegelhalter, D. J., Best, N. G., Carlin, B. P., and Van der Linde, A. (1998). Bayesian deviance, the effective number of parameters, and the comparison of arbitrarily complex models. Research Report, 98–009. Available at: http://www.med.ic.ac. uk/divisions/60/biointro.asp (accessed February 2018).
Tinajero, C., Lemos, S., Maria, A., Araujo, M., Ferraces, M., & Paramo, F. (2012). Cognitive style and learning strategies as factors which affect academic achievement of Brazilian university students. Psicologia: Reflexão e Crítica, 25(1), 105-113. https://doi.org/10.1590/S0102-79722012000100013.
Warnimont, C. S. (2010). The Relationship between Students' Performance on the Cognitive Abilities Test (CogAT) and the Fourth and Fifth Grade Reading and Math Achievement Tests in Ohio. Unpublished doctoral dissertation. Bowling Green State University.
Wright, B. D., & Masters, G. N. (1982). Rating scale analysis. Chicago, IL: Mesa.
Zakri, A. (2020). Identifying differential item functioning of the "EMBU" test of parental rearing styles among a sample of secondary school students. Journal of Education College, 3(186). 676- 720.