The Association of Language Testers in Europe has established a set of common standards for language exams which cover all stages of the language testing process. The international ECL language examination system works on meeting these so-called ‘minimum standards`.
In the course of this quality assurance process special attention is paid to the quantitative and qualitative analysis of the test results. After each exam, ECL experts carry out statistical analyses in order to compute the difficulty, discrimination and reliability of the task items. For this purpose, the experts deploy the measurement methods of the Classical Test Theory on the one hand, and the Rasch model, on the other hand. Unlike the psychometric tests of the Classical Test Theory, the Rasch measurement model indicates misfits, or in other words test items having a poorer quality compared to the rest of the items and, therefore, do not fit the model.
The present paper deals with the issue of misfit items in German language tests measuring listening skills at level B2 according to CEFR. Those items which were indicated as misfits by the Rasch model in the course of ten exam terms are carefully examined; the reasons for their `malfunction` are then thoroughly searched for. The findings of these analyses are to be applied by item writers in the course of future test developments.
Alderson et al. 1995: Alderson, J. Charles, Caroline Clapham and Dianne Wall. Language Test Construction and Evaluation. Cambrigde: Cambridge University Press.
ALTE 2007: 17 Mindeststandards zur Sicherstellung von Qualität in den Prüfungen der ALTE. http://www.alte.org/attachments/files/minimum_standards_de.pdf (letzter Abruf: 21. 03. 2017).
Bond und Fox 2015: Bond, Trevor G. and Christine M. Fox. Applying the Rasch Model. Fundamental Measurment int he Human Sciences. Third Edition. New York/London: Routledge.
Crocker und Algina 2006: Crocker, Linda and James Algina. Introduction to classical and modern test theory. Mason, OH: Cengage Learning.
ECL Exam System. http://eclexam.eu/deutsch/ (letzter Abruf: 22. 03. 2017).
Fulcher 2010: Fulcher, Glenn. Practical Language Testing. London: Hodder Education.
Mackey und Gass 2005: Mackey, Alison and Susan M. Gass. Second language research: Methodology and design. Mahwah, NJ: Lawrence Erlbaum.
NYAK 2017: Akkreditációs Kézikönyv. http://www.nyak.hu/nyat/doc/ak2017/Akkreditacios_Kezikonyv_2017_V5.pdf (letzter Abruf: 22. 03. 2017).
Trim et al. 2001: Trim, John, Brian North und Daniel Coste. Gemeinsamer europäischer Referenzrahmen für Sprachen: lernen, lehren, beurteilen. Berlin / München: Langenscheidt.
Wright und Linacre 1994: Wright, Benjamin Drake and John Michael Linacre. Reasonable mean-square fit values. Rasch Measurement Transactions, 8 (3), 369-370.