On proficiency scales and errors of measurement for educational tests

Svend Kreiner & Jeppe Bundsgaard

Session 4B, 11:20 - 12:05, VIA

Educational tests are used for two main reasons: 1) to give teachers insight into the abilities of their students, and 2) to give administrators, researchers, the public and politicians knowledge of status, progression and relative level of a group of students (as compared to another).

These two reasons put different demands on the tests. In the first case, the teacher want to know with some confidence what an individual student is capable of, knows and understands, and the teacher welcomes suggestions on how to help the students reach the next goals. In the second case, administrators etc. want to know if progression was made and whether certain thresholds was reached for a specific population.

Combinations of the two goals are possible, but hard to attain. In international large-scale assessments like PISA, and the IEA assessments (PIRLS, TIMMS, ICILS etc.), it is in principle possible to provide teachers with scores on proficiency scales defined by subject matter arguments relating to student progressions, but results on individual students are never reported. At classroom and student levels, test results are often collected at set points in time so that results can be aggregated to provide information to administrators at higher levels, whether or not it is convenient and/or useful for the teacher to have information on the class and the student at the time where administrators need them. Since test results are collected at specific points of time during the school year, it is possible not only to reports simple transformed raw scores, but also percentile scores that can be used to compare test results for separate students to the complete student population, and it is rare to find examples where test results at student levels are more than simple transformed raw scores and percentile scores.

This paper is an argument for development of and use of proficiency scores for applications of educational tests at classroom and student level. We will discuss different ways to interpret test scores and different ways to construct informative proficiency scores (e.g. Fraillon et al., 2015; OECD, 2014; Draney & Wilson 2011; Wilson & Santelices 2017), providing more useful information than transformed raw scores and percentile scores; and we will show how to assess the measurement error of proficiency scores. The methods will be illustrated with data on proficiency scales for a test measuring 21st Century Skills (Bundsgaard, 2018; Bundsgaard, in review), on data from The Danish National Test (DNT), and data from International Computer and Information Literacy Study (ICILS 2013) (Fraillon, Ainley, Schulz, Friedman, & Gebhardt, 2014).

Published Sep. 5, 2018 1:48 PM - Last modified Sep. 5, 2018 1:48 PM