ESRA 2019 Draft Programme at a Glance

Contemporary issues in the assessment of measurement invariance 3

Session Organisers Dr Daniel Seddig (University of Cologne & University of Zurich)
Professor Eldad Davidov (University of Cologne & University of Zurich)
Professor Peter Schmidt (University of Giessen)
TimeWednesday 17th July, 16:30 - 17:30
Room D22

The assessment of the comparability of cross-national and longitudinal survey data is a prerequisite for meaningful and valid comparisons of substantive constructs across contexts and time. A powerful tool to test the equivalence of measurements is multiple-group confirmatory factor analysis (MGCFA). Although the procedures of measurement invariance (MI) testing seem to become increasingly used by applied researchers, several issues remain under discussion and are not yet solved. For example:

(1) Can we trust models with small deviations (approximate MI)? Is partial MI sufficient? How should one deal with the lack of scalar MI, as is the case in many large-scale cross-national surveys?
(2) How to decide whether a model with a high level of MI should be preferred over a model with a lower level of MI? Which fit indices should be used?
(3) Is MI needed anyway and would it be best to start firstly with a robustness calculation?

Recent approaches have tackled the issues subsumed under (1) and aimed at relaxing certain requirements when testing for measurement invariance (Bayesian approximate MI, Muthén and Asparouhov 2012; van de Schoot et al 203) or using the alignment method (Asparouhov and Muthén 2014). Furthermore, researchers addressed the issues subsumed under (2) and recommended the use of particular fit statistics (e.g., CFI, RMSEA, SRMR) to decide among competing models (Chen 2007). The question raised under (3) is a more general one and raises concerns about the contemporary uses of the concept of MI. Researchers (Welzel and Inglehart 2016) have argued that variations in measurements across context can be ignored, for example in the presence of theoretically reasonable associations of a construct with external criteria.

This session aims at presenting studies that assess measurement invariance and/or address one of the issues listed above or related ones. We welcome (1) presentations that are applied and make use of empirical survey data, and/or that (2) take a methodological approach to address and examine measurement invariance testing and use for example Monte-Carlo simulations to study the above mentioned issues.

Keywords: measurement invariance, comparability, cross-cultural research, structural equation modeling

Measuring sociopolitical orientations: Comparing the equidistance of verbal labels across unipolar and bipolar rating scales

Professor Steffen Kühnel (University of Göttingen) - Presenting Author
Dr Jan Karem Höhne (University of Mannheim)
Mr Stephan Schlosser (University of Göttingen)
Professor Dagmar Krebs (University of Gießen)

Major social surveys, such as the European Social Survey (ESS), regularly measure respondents’ attitudes and opinions on sociopolitical topics, such as reduction of financial inequality. To measure attitudes about such topics, researchers mostly use survey questions with rating scales that are based on an ordered and closed-ended response format. One important design aspect is the polarity (i.e., unipolar or bipolar) of rating scales. Unipolar scales consist of response categories with increasing or decreasing intensity of a rating dimension, such as agreement. Bipolar scales, in contrast, contain response categories extending between logical opposites, such as agreement/disagreement, with a so-called “transition point”, such as neither/nor, in the middle of the scale. Although unipolar and bipolar scales are frequently interchangeably used, there is only little knowledge about their impact on (latent) response distributions. In this study, we therefore compare the equidistance of verbal labels across unipolar and bipolar scales by modeling their latent thresholds. For this purpose, we conduct a survey experiment in the probability-based “German Internet Panel (GIP)” in March 2019 using questions on sociopolitical orientations. We randomize respondents to four conditions: 1) a five-point, fully labeled unipolar scale, 2) a five-point, end labeled unipolar scale, 3) a five-point, fully labeled bipolar scale, and 4) a five-point, end labeled bipolar scale. Although the data collection is still imminent, we expect differential item functioning for the extreme and the middle response categories of unipolar and bipolar rating scales. Additionally, within scale polarity, we expect differences between endpoint and fully verbalized response categories.

Re-assessing the radius of generalized trust: measurement invariance, think aloud protocols, and the role of education

Dr Wahideh Achbari (University of Amsterdam) - Presenting Author
Professor Eldad Davidov (University of Cologne & University of Zurich)

Generalized trust (GT) is a conspicuous indicator in studies on social cohesion. While prior research has debated the negative link between ethnic diversity and trust, only a handful of studies have so far focused on the underlying measurement issues. Contrary to conventional wisdom, a British study using think aloud protocols demonstrated that the majority of respondents high in GT think ‘most people’ refers to people they know, whereas a high proportion of those who are low in GT think about strangers (Sturgis & Smith, 2010). A study by Delhey, Newton, and Welzel (2011) employs the WVS and concludes that the trust radius is much smaller in Confucian cultures, but wider in most Western nations, without conducting any invariance tests. These results contradict findings by Sturgis and Smith (2010) since within-country differences are considered less important. In this paper, we employ the think aloud data (Sturgis & Smith, 2010) to explore the role of education in differences in reference frames. This is particularly relevant since GT has been found to consistently positively correlate with having a university degree. Intuitively, we can expect that higher educated people are more likely to see ‘most people’ in the GT question as an abstract (thus unknown) category (perhaps even out-groups). We additionally conduct formal invariance tests across all countries included in the WVS, comparing educational groups. This extends existing invariance studies, which only examine between-country differences (Reeskens & Hooghe, 2008; Van der Veld & Saris, 2011; Meuleman & Billiet, 2012; Freitag & Bauer, 2013). Moreover, we employ the alignment method (Asparouhov and Muthén 2014) to validate our results. We argue that measurement issues of GT across groups cannot be ignored, since the think aloud results (as an external criterion) suggest there is no theoretically viable argument that ‘most people’ unequivocally refers to out-groups.

Assessing invariance across racial and ethnic groups in the measurement of science literacy

Professor Ian Brunton-Smith (University of Surrey) - Presenting Author
Professor Nick Allum (University of Essex)

Science literacy amongst the general public is increasingly important for securing and sustaining many jobs, to understanding key health concepts to enhance quality of life, and for increasing public engagement in societal decision-making. But whilst much is now known about how science literacy varies across contexts and over time, research has only recently begun to shed light on the disparities that exist between racial and ethnic groups (Allum et al., 2018). In the US, this work has pointed to substantial disparities in science knowledge between blacks and Hispanics compared to white Americans that is unexplained by other well-established disparities in fundamental axes of disadvantage including broader foundational literacies, education and occupational status.

In this paper we consider the measurement of scientific literacy, examining the extent to which observed racial and ethnic differences in science knowledge may be due to differences in the interpretation of particular knowledge items used to measure science literacy, which may produce scales that are biased. Combining data from six waves of the General Social Survey, we use Bayesian approximate invariance tests (Asporouhov and Muthen, 2013; 2014) to identify science knowledge items where measurement invariance does not hold. This enables us to pinpoint potential differences in contextual understanding for different racial and ethnic groups, and also allows us to generate more robust assessments of the presence of a knowledge gap under the assumption of approximate measurement invariance. We contrast models for full, partial and approximate invariance and evaluate the implications for current understandings of disparities in science literacy.