ESRA logo
Tuesday 18th July      Wednesday 19th July      Thursday 20th July      Friday 21th July     




Wednesday 19th July, 11:00 - 12:30 Room: Q4 ANF2


Measurement Invariance: Testing for it and Explaining Why It Is Absent 2

Chair Dr Katharina Meitinger (GESIS Leibniz Institute for the Social Sciences )
Coordinator 1Professor Eldad Davidov (University of Cologne and University of Zurich)

Session Details

Measurement invariance tests are a popular approach to assess the cross-national comparability of data. However, researchers often have difficulties to establish the highest level of measurement invariance, scalar invariance (Davidov et al. 2012).

In recent years, the predominant approach to “fix” this issue is to opt for more statistical sophistication and relaxing certain requirements when testing for measurement invariance. Approaches, such as Bayesian structural equation modelling (BSEM) (Muthén and Asparouhov 2012; van de Schoot 2015) or alignment (Asparouhov and Muthén 2014) fall in this category.

However, these approaches cannot provide reasons as to why measurement invariance cannot be found. An alternative approach in this context is to view the lack of measurement invariance as a source of information on cross-group differences and to try explaining the individual, societal, or historical sources of measurement nonequivalence (Davidov et al. 2014). On the one hand, quantitative approaches—such as the multiple indicators multiple causes model (MIMIC) (Davidov et al. 2014) and the multilevel structural equation models (MLSEMs) (Davidov et al. 2012)—aim to substantively explain cases of noninvariace. On the other hand, there is an increasing awareness of the potential of mixed methods approaches to explain instances of measurement invariances (e.g., Latcheva 2011; Panyusheva & Efremova 2012; Meitinger 2016). These studies mostly use results from cognitive interviewing or online probing to explain why measurement invariance was not found. In contrast to the purely quantitative approaches, the mixed method approaches often reveal previously unknown and surprising causes for the incomparability of data.

This session aims at presenting studies that either test for measurement invariance or examine the reasons why tests for measurement variances failed in certain research situations. We welcome (1) presentations that take a purely quantitative approach to test measurement invariance or explain non-invariance, and (2) presentations which apply a mixed method approach to explain instances of missing measurement invariance.

Paper Details

1. Addressing Measurement Invariance with the Alignment Method – A Flexible and Powerful Approach to Explore Misfit in Large-scale Cross-National Surveys across Countries and Data collections
Professor Ingrid Munck (Department of Education and Special Education at Gothenburg University)
Professor Carolyn Barber (School of Education at University of Missouri-Kansas )

This study applies the Alignment Method (Muthén and Asparouhov, 2014) to assess survey measurement comparability in the assessment of attitudes toward immigrants’ rights across countries, cohorts and genders. In a previous analysis of attitudes toward immigration among European adults using data from the ESS study, Billiet and Meuleman (2012) assess cross-cohort comparability of their measures by using multiple-group confirmatory factor analysis (MGCFA). In a series of confirmatory factor analysis testing for configural, metric and scalar invariance they present a stepwise, partly data- driven procedure guided by modification indices using measurement invariance tested with LISREL (Jöreskog 1971; Jöreskog and Sörbom 1993). In the analysis they adopted a ‘bottom-up’ test strategy starting with the weakest level of invariance, configural invariance, imposing invariance constraints one item at a time. As pointed out by Davidov, Meuleman, Cieciuch, Schmidt, and Billiet (2014), this approach is cumbersome and prone to resulting in a wrong model (due to the reliance on data-driven modification indices), especially when there are many groups being compared. The Alignment method offers new directions and solutions for the assessment of measurement equivalence (Davidov et al. 2014). This approach starts with the configural model in a MGCFA with no invariance, and attempts to find as much invariance as possible by letting the factor means and variances vary across groups (Muthén and Asparouhov, in press). This makes it possible to build measurement invariance analysis on the less restrictive configural model rather than struggling for scalar equivalence. The measurement model can then be selected on theoretical grounds which are preferable to using data-driven step wise procedures to come up with a useful measurement model.
The Alignment method will be applied to two data collections of the IEA surveys of civics and citizenship education: the 1999 Civic Education Study, and the 2009 International Civics and Citizenship Education Study. The full scale illustration analyzed the 1999/2009 dataset with responses to five Likert scale items measuring Support of Immigrants’ Rights, in all approximately 80,000 European native born 14-year-olds from 28 countries. We examined measurement invariance across a 92 group design (country by cohort by gender) which shows that the scale is statistically well-grounded for unbiased group comparisons despite the presence of non-invariance (scalar model RMSEA = 0.097). The effects of the badness-of-fit of the scalar model are scrutinized by post processing analysis of correlations and ranking lists comparing the aligned score with the factor score achieved from applying the scalar model. In the aligned score the misfit could be located to just a few groups; specifically, female students from Cyprus in the 1999 study and female students from Latvia in the 2009 study, However, there were only marginal effects when groups with more severe degrees of misfit were kept into the reported results. Additional analysis to be presented will focus on potential explanations for observed misfit. Overall, the alignment methodology makes it feasible to comprehensively assess measurement invariance in large datasets,


2. Unifying and extending methods for measurement invariance using Bayesian regularization
Ms Sara van Erp (Tilburg University)
Dr Joris Mulder (Tilburg University)
Dr Daniel Oberski (Utrecht University)

When comparing multiple groups it is important to establish measurement invariance (MI), meaning that the latent construct under investigation is measured in the same way across groups. Traditionally, MI is tested using multiple group confirmatory factor analysis (MGCFA) with certain restrictions on the model. The goal is often to attain scalar invariance, which sets the loadings and intercepts equal across groups, so that factor means can be meaningfully compared. In practice, however, scalar invariance is often an unattainable ideal. Therefore, several alternative methods have been proposed to test for MI, such as partial MI, Bayesian approximate MI, and the alignment method. Although these techniques relax the restrictions imposed by the scalar invariance model, they do impose specific assumptions about the underlying structure of MI. Both the alignment method and approximate MI assume many small deviations from invariance, while partial MI requires at least two invariant items.

In this presentation, the different methods for MI will be unified by considering them as specific regularization approaches. Regularization methods (e.g. lasso, ridge) are popular in sparse regression problems where the number of predictor variables is (much) greater than the number of observations. Traditionally, these approaches minimize a loss function subject to a norm constraint or penalty on some parameters, where different norm constraints lead to different shrinkage behaviors. We will show how the problem of MI resembles the sparse regression problem and how the existing methods for MI relate to regularization approaches.

We adopt a Bayesian approach, which provides more flexibility. Bayesian analysis combines the likelihood of the data with a prior distribution to obtain a posterior distribution that is used to make inferences. It has been shown that, under certain prior distributions, the mode of the posterior distribution corresponds to popular regularization approaches. Employing this Bayesian regularization framework therefore allows us to 1) unify the existing methods for MI and 2) extend the current toolbox by considering different priors. Specifically, we will consider prior distributions that are less stringent in their assumptions about the structure of MI, thereby allowing to model additional forms of MI. Several penalties and their corresponding prior distributions will be discussed in relation to MI and their behavior will be investigated through multiple illustrations. Finally, we will provide recommendations on how to choose between the different possible prior distributions.


3. Testing 9- and 12-item versions of the Grasmick et al. (1993) self-control scale for measurement invariance across cultural-contexts: A comparison of different approaches
Dr Heinz Leitgöb (University of Eichstätt-Ingolstadt)
Dr Daniel Seddig (University of Cologne)
Dr Dirk Enzmann (University of Hamburg)

Cross-cultural research is steadily gaining relevance in empirical criminology. However, lack of awareness about the consequences of measurement non-equivalence in key concepts is still prevalent. To address this shortcoming, we aim at testing self-control as one of the most prominent explanatory factors in crime causation (see e. g. Gottfredson & Hirschi 1990) for measurement invariance (MI) across different cultural settings. Specifically, we are planning to apply three different approaches to test shortened 12- and 9-item versions of the self-control scale proposed by Grasmick et al. (1993) as used in the international self-report delinquency study (ISRD): (i) confirmatory factor analyses based on classical test theory, (ii) item repsonse theory (IRT) methods (within IRT, the MI-issue is better known as differential item functioning DIF), and (iii) Procrustes rotation toward the factor structure of a reference group and subsequent comparisons of congruence measures (linearity, proportionality, additivity, and identity). As database serve the second and third waves of the ISRD-study, conducted in 29 resp. 27 European, American, and Asian countries. The ISRD project surveys 12-16 year old juveniles in school classes, representative to the cities or regions of the participating countries, by using self-administered questionnaires. To ensure analytical feasibility, we confine the data to countries grouped according to the different cultural spheres according to language, indices of development and welfare regime, as well as geographic location. Besides testing for MI, we will draw our attention to the empirical consequences of measurement non-equivalence for the explanation of cross-cultural differences in crime rates if the MI-assuptions are violated.


4. Exact and approximate approaches to test the measurement invariance of gender role attitudes in 58 countries
Dr Vera Lomazzi (GESIS - Leibniz Institute for the Social Sciences)

Several repeated cross-national surveys include measurements of attitudes towards gender roles aimed to investigate individuals´ beliefs regarding the appropriateness of men and women´s roles in a certain context. When used to compare attitudes across countries, it should be noted that these measurements present critical aspects which could cause the lack of equivalence between different cultural contexts and therefore to misleading results.
In addition to the methods bias that could occour in cross-national data collections, mainly due to translation mistakes, modes of data collections, differences in the sampling procedures, as well to social desiderability and acquiescence that can vary by cultural context (Heath et al., 2009; van de Vijver & Tanzer, 2004), the measurement equivalence of gender role attitudes appears particularly sensitive to construct bias. This is because different ways of defining gender roles are established across cultural contexts (Constantin & Voicu, 2014; Lomazzi, 2016). Institutional factors as welfare regimes, religious traditions, or the labor market dynamics, historically contributed to the development of different gender cultures across societies, prescribing gender roles accordingly (André et al., 2013; Pfau-Effinger, 2004; Sjöberg, 2004). This reflects not only in shaping gender beliefs, but also in the meaning given to questions investigating these concepts (Braun, 1998, 2009; Braun et al., 1994).
Regardless these potentially critical aspects, the use of these measurements in comparative studies is quite widespread and only recent studies introduced the evaluation of the quality of the measurement instruments in this field: Constantin and Voicu (2014) tested the gender role scale included in ISSP 2002 and WVS 2005, while Weziak-Bialowolska (2015) evaluated a measure of gender equality based on a combination of different items from WVS 1994 that are not originally expressed as a scale.
Informed by the most recent development of the assessment of measurement invariance (Asparouhov & Muthén, 2009, 2014; Cieciuch et al., 2014; Van De Schoot et al., 2013; Davidov et al., 2015), this paper aims to test the measurement equivalence of the gender role attitudes scale included in the sixth wave of the World Values Survey (2010-2014) in 58 countries.
As more approximate procedures allow to estimate means and variances without constraining loadings and intercepts to be equal as the ´exact´approaches do, these new approaches could be helpful in assessing the measurement invariance of gender role attitudes. In particular here, after testing country by country if the model fits the data, I will employ the alignment optimization (Asparouhov & Muthén, 2014), eventually combined with the Bayesian approach, in order to assess measurement equivalence. However, considering the recent appearence of these approaches, the results will be compared to the outputs obtained adopting the traditional “exact” approach (MFGCA) to enlighten possible advantages of the novel procedure.