ESRA 2019 Draft Programme at a Glance


Contemporary issues in the assessment of measurement invariance 2

Session Organisers Dr Daniel Seddig (University of Cologne & University of Zurich)
Professor Eldad Davidov (University of Cologne & University of Zurich)
Professor Peter Schmidt (University of Giessen)
TimeWednesday 17th July, 14:00 - 15:00
Room D22

The assessment of the comparability of cross-national and longitudinal survey data is a prerequisite for meaningful and valid comparisons of substantive constructs across contexts and time. A powerful tool to test the equivalence of measurements is multiple-group confirmatory factor analysis (MGCFA). Although the procedures of measurement invariance (MI) testing seem to become increasingly used by applied researchers, several issues remain under discussion and are not yet solved. For example:

(1) Can we trust models with small deviations (approximate MI)? Is partial MI sufficient? How should one deal with the lack of scalar MI, as is the case in many large-scale cross-national surveys?
(2) How to decide whether a model with a high level of MI should be preferred over a model with a lower level of MI? Which fit indices should be used?
(3) Is MI needed anyway and would it be best to start firstly with a robustness calculation?

Recent approaches have tackled the issues subsumed under (1) and aimed at relaxing certain requirements when testing for measurement invariance (Bayesian approximate MI, Muthén and Asparouhov 2012; van de Schoot et al 203) or using the alignment method (Asparouhov and Muthén 2014). Furthermore, researchers addressed the issues subsumed under (2) and recommended the use of particular fit statistics (e.g., CFI, RMSEA, SRMR) to decide among competing models (Chen 2007). The question raised under (3) is a more general one and raises concerns about the contemporary uses of the concept of MI. Researchers (Welzel and Inglehart 2016) have argued that variations in measurements across context can be ignored, for example in the presence of theoretically reasonable associations of a construct with external criteria.

This session aims at presenting studies that assess measurement invariance and/or address one of the issues listed above or related ones. We welcome (1) presentations that are applied and make use of empirical survey data, and/or that (2) take a methodological approach to address and examine measurement invariance testing and use for example Monte-Carlo simulations to study the above mentioned issues.

Keywords: measurement invariance, comparability, cross-cultural research, structural equation modeling

Investigating the cross-cultural equivalence of the General Health Questionnaire (GHQ)

Professor Nick Allum (University of Essex) - Presenting Author
Ms Kirby King (Government Statistical Service)
Dr Paul Stoneman (Goldsmiths College)

Background: The General Health Questionnaire (GHQ) is a widely used instrument for identifying minor psychiatric disorders in the general population. Notwithstanding its widespread use in social and epidemiological research, little is known about its validity as a comparative tool for measuring the mental health of adults from different ethnic groups. Our objective in this paper is to assess the GHQ’s suitability for this task by testing for measurement invariance with respect to five ethnic minority groups in the UK: Indians, Pakistanis, Bangladeshis, Caribbeans, Africans, along with the white British majority. We investigate the extent to which the short-form version of the instrument – the GHQ-12 - exhibits configural, metric and scalar invariance across six ethnic groups using the UK Household Longitudinal Study (N= 35,437).

We evaluate alternative factor structures for the GHQ that have been suggested in previous literature and show that a unidimensional structure with correlated errors for reverse-valenced items provides the best fit in all subgroups. We submit this model to tests for metric and scalar invariance across groups and find substantial equivalence in the measurement properties of the scale across all groups. We complement this with tests of association with criterion variables for both latent and summated scale versions of the instrument and find little difference.

Conclusions: We find that policy makers and scholars should not be overly concerned with the cultural sensitivity of the GHQ-12 and that valid comparisons across different ethnic groups can be made using the instrument in adult populations.


Evaluating the use of the Patient Health Questionnaire (PHQ-9) as a screening tool for depression

Ms Kristín Hulda Kristófersdóttir (University of Iceland/ Methodology Research Center, University of Iceland) - Presenting Author
Mr Hans Haraldsson (University of Iceland/ Methodology Research Center, University of Iceland)
Ms Hilma Rós Ómarsdóttir (University of Iceland/ Methodology Research Center, University of Iceland)
Ms Ragnhildur Lilja Ásgeirsdóttir (University of Iceland/ Methodology Research Center, University of Iceland)
Ms Vaka Vésteinsdóttir (University of Iceland/ Methodology Research Center, University of Iceland)
Ms Hafrún Kristjánsdóttir (Reykjavík University)
Ms Fanney Thorsdottir (University of Iceland/ Methodology Research Center, University of Iceland)

The Patient Health Questionaire (PHQ-9) is frequently used for screening of depression disorder. Currently, there is a lack of research on the measurement invariance of the PHQ-9 across gender. The purpose of this study is to test the PHQ-9 for differential item functioning (DIF) related to gender. A data set of 621 clinical participants was used, 101 males and 520 females. The data set used was from a study where participants with anxiety and/or depression symptoms were recruited to assess efficacy of treatment. Participants answered several screening lists, including PHQ-9, to evaluate changes in disorder symptoms throughout the treatment. The data used in this study consists of participants answers to the PHQ-9 before treatment. The PHQ-9 was evaluated for DIF using the the graded response model within the IRT approach. The factor structure of the PHQ-9 was confirmed using confirmatory factor analysis. The scale as a whole performed well in terms of IRT information but the results suggested that some of the items need revision. No clear evidence of DIF was found for any item in the PHQ-9 between males and females. Further reaserch is needed to establish valid use of the PHQ-9 as a screening tool for depression symptoms.