The MIMIC pure anchor method for DIF: Detecting psychological impact, not bias

Maryam Alqassab & Gavin T. L. Brown

Session 3A, 9:45 - 11:15, HAGEN 2

Differential item functioning (DIF) indicates a construct-irrelevant factor (e.g., age, sex, or ethnicity) systematically impacts responding to items. DIF studies are usually carried out with demographic groups rather than with psychological grouping variables that might not be construct-irrelevant. DIF could be consistent with a construct that is relevant to the phenomenon of interest suggesting impact rather than bias (Zumbo, 1999).

When items are correlated (i.e., factors), DIF may be inflated by the collinearity of items. The pure-anchor technique within multiple-indicator, multiple cause (M-PA) analysis (Shih & Wang, 2009) uses a DIF-free-then-DIF procedure that fixes one item with no DIF as an anchor to reduce the probability of Type I errors in detecting DIF (Wang & Shih, 2010). The iterative MIMIC procedure (M-IT) tests each item within a construct individually and sets as the pure anchor the item which generated the lowest DIF index (Shih & Wang, 2009).

This study uses a multi-dimensional (i.e., 4 factors, 33 items) research inventory (i.e., Student Conceptions of Assessment, version VI; Brown, 2011) and a brief inventory of student interest and self-efficacy in either reading or mathematics. Higher test scores have been associated with the SCoA factor that assessment is for improvement (Brown, Peterson, & Irving, 2009) and when students have greater interest or self-efficacy (‘Otunuku & Brown, 2007). Hence, DIF in favour of students with higher self-efficacy or interest may indicate impact rather than bias.

Participants (N = 799) were Year 9 and 10 high school students in New Zealand. Interest and self-efficacy in reading and mathematics were used as DIF grouping variables. Participants were grouped by interest (high vs. low), self-efficacy (high vs. low), and test subject (mathematics vs reading comprehension), resulting in small reference and the focal groups (n = 180). DIF by interest and self-efficacy was conducted using M-PA for the four SCoA factors in each subject separately. Only one item was used as an anchor and analysis used the WLSMV estimator (Muthén & Muthén, 2010).

Of the 29 items, after fixing the pure anchor, five items in mathematics and eight items in reading had statistically significant Wald test DIF magnitudes. This contrasted positively to the standard MIMIC DIF analysis which found 18/33 items with statistically significant DIF in mathematics and 17 in reading. A Monte Carlo simulation study of 10,000 replications and two groups of 200 using population parameter values (i.e., number of items per factor ranging from 4 to 10, loadings set at either 0.80 or 0.60) akin to the range of regression weights seen in studies with the SCoA, found that except for expected loadings of 0.80 and either 4 or 10 items per factor, the bias in parameter estimation was much greater than 10% (M=47.88, SD=37.78). This indicates that the observed DIF values are highly likely to be over-estimated, even using the M-PA approach. Items with statistically significant DIF aligned with the known effects of self-efficacy and interest on academic achievement, supportive that impact, not bias was present. Further work with the promising M-PA procedure is warranted.

Published Sep. 5, 2018 1:43 PM - Last modified Sep. 5, 2018 1:43 PM