All time references are in CEST
Contemporary issues in measurement invariance research
| Dr Daniel Seddig (University of Cologne)
Professor Eldad Davidov (University of Cologne and URPP Social Networks, University of Zurich)
Professor Peter Schmidt (University of Giessen)
|Tuesday 18 July, 16:00 - 17:00
The assessment of measurement invariance of survey data is a prerequisite for meaningful and valid comparisons of substantive constructs across countries, cultures, and time. A powerful tool to test measurement invariance is multiple-group confirmatory factor analysis. In addition to testing “exact” (full or partial) measurement invariance with the traditional tools, recent methods have aimed at testing “approximate” measurement invariance using Bayesian structural equation modeling or alignment optimization, assessing clustered measurement invariance, using visualization techniques, and separating response shift bias and true change. In addition, many researchers are concerned with paying more attention to survey methodological aspects in order to advance the development of invariant measurements, such as rating scale and survey mode decisions, cognitive pretesting and web probing approaches, and cross-cultural scale adoption and translation methods. Finally, multilevel analysis and qualitative methods have been used to try to explain noninvariance.
This session aims to present studies that address questions such as “How much can we trust the above methods and related methods to test for measurement invariance?" or "What is the need to test for measurement invariance in different situations?”. We welcome (1) presentations that are applied and make use of empirical survey data, and/or that (2) take a methodological approach to address and examine measurement invariance testing.
Keywords: Measurement invariance, comparability, cross-cultural research, longitudinal research, confirmatory factor analysis, SEM
Dr Petra Raudenska (The Czech Academy of Sciences, Institute of Sociology) - Presenting Author
The use of subjective well-being measures is common across empirical psychology, and the social sciences; more recently, it has also become mainstream in economics. A major disadvantage of many of the subjective well-being studies based on the cognitive dimension that have been undertaken to date is that they tend to rely on single-item measures of life satisfaction (“All things considered, how satisfied are you with your life as a whole?”) and/or happiness (“Taking all things together, how happy would you say you are?”), rather than more refined, multi-item measures of well-being. However, it is known that single-item measures are rather imprecise, do not have high reliability or high construct validity, and do not allow for controlling for measurement errors. Single-item measures of general well-being are increasingly being analyzed cross-culturally but without any proof of achieved comparability level. The major goal of this study is to investigate the cross-country measurement invariance of two most frequently used single items—general life satisfaction and happiness—across a large set of countries, using 45 data samples from the World Values Survey, International Social Survey Program, European Values Study, European Social Survey, European Quality of Life Survey, and Eurobarometer from 1976 to 2018, for over 14,00,000 participants. With regard to measurement invariance testing, conventional measures that rely on the construct that must be measured only by multiple indicators (i.e., three at least), cannot be calculated for single-item measures. In this study, I used a basic alternative method for assessing measurement invariance of single-items by using other subjective well-being measures available in cross-national questionnaires to create the best possible general multiple well-being indicator based on a high correlation among available items. Almost in all of the selected large-scale sample surveys, I found two single-items measuring general life satisfaction and happiness with high correlations. As the third most correlated item with the life satisfaction scales, I selected mostly the single-item general health measure or life domain satisfaction (i.e., job, family, or democracy satisfaction). For assessing the extent of specific item non-invariance, I used the most recent Bayesian approximation approach. The results reveal that the factor loadings and intercepts of the happiness item are deviated to a lower extent and, thus, indicated higher comparability across countries. On the other hand, the differences between these two single items were rather minor compared to other single items, such as the general health item and life domain satisfaction items. Thus, only partial scalar approximate measurement invariance holds for happiness and general life satisfaction and their average could not be compared across all participating countries, only for a few.
Ms Alisa Remizova (University of Cologne) - Presenting Author
Individual religiosity measures have been widely used to compare individuals and societies across various disciplines. However, the cross-country comparability of measures has often been questioned. Comparability is the prerequisite for meaningful analyses of religiosity across countries and depends on measurement invariance (MI). MI implies that the measures capture the same construct in the same way in different countries so that respondents similarly interpret the survey questions. If the measurement is noninvariant, not only religiosity but other factors may determine the indicator scores that jeopardises cross-country comparisons' validity. While previous studies have evidenced that religiosity measures lack invariance, they have yet to explain why they produce non-comparable data. The current research provides a systematic explanation of the noninvariance of religiosity measurement across countries. We use the data of the two latest rounds of the World Values Survey (WVS) and employ the multilevel structural equation modeling approach that allows accounting for the cross-country variations in measurement in a theoretically driven way. The results demonstrate that the noninvariance can be explained by the differences in countries' religious composition, regulation of religion, modernisation, and cultural and communist backgrounds. We conclude with directions for the future design of religiosity measures and recommendations for practical researchers using the WVS data.
Mr Rune Müller Kristensen (Aarhus University) - Presenting Author
International Large-Scale Assessments (ILSAs) make a great effort to avoid measurement invariance in assessment of student abilities, among other steps by controlling for differential item functioning (DIF) between countries to ensure fairness in comparisons. While avoiding DIF is good for fairness in testing, it could also be viewed as informative about test takers when appearing unexpected between groups of respondents (e.g. Bundsgaard, 2019).
Students’ socioeconomic status (SES) have long been known to correlate with student achievement, where research has focused on elements in the student’s background, that influences achievement, while less research has focused on what exact abilities that are affected by SES.
The presentation will exploit item DIF to assess whether students’ SES affects their ability to respond correctly to an item, and hence inform about different respondent groups abilities to get insights into the black box wherein SES affects student outcome.
The presentation aims at addressing four questions: 1) To which degree do DIF exist between students with high or low SES? 2) Are there similarities in item characteristics that causes DIF across subjects? 3) Are there differences or similarities in item DIF across countries? And 4) to examine whether SES related item DIF seems to be caused by the way items are constructed or by item content and hence differences in abilities within the subject.
To this end the presentation will use data from the Nordic countries participating in the latest rounds of PIRLS (2021) and TIMSS (2019), and hereby exploit the possibility of comparing countries with similar SES distributions across three different subjects. Initial analyses of TIMSS mathematics items indicate that DIF is present, although to a relatively low degree. Further it indicates that item DIF might be related to item construction more than to item content.