ESRA logo

ESRA 2023 Glance Program

All time references are in CEST

The potential of survey and questionnaire design to achieve measurement invariance 2

Session Organisers Dr Katharina Meitinger (Utrecht University)
Professor Natalja Menold (TU Dresden)
Dr Heinz Leitgöb (Leipzig University)
TimeWednesday 19 July, 16:00 - 17:30
Room U6-09

A common finding in measurement invariance testing is that the property of metric or scalar measurement invariance is difficult to achieve in cross-cultural survey data. Whereas approximate approaches of measurement invariance testing received great interest, the impact of survey methodological decisions on the results of measurement invariance analysis have been relatively underemphasized. However, previous research revealed the serious impact of various survey methodological aspects on measurement invariance, such as differences in question wording, translations, rating scale forms, visual presentation, modes, or devices. At the same time, survey methodology also provides us with a toolkit to improve the measurement invariance of survey questions. Optimal translation procedures (e.g., TRAPD approach) or approaches at the development and pretesting stage (e.g., focus groups, expert reviews, cross-cultural cognitive interviewing, web probing) can potentially improve the comparability of survey items. Some of these approaches could also be implemented during or after the actual data collection (e.g., web probing). Careful conceptualization and operationalization can help to improve the factorial structure of indicators and therefore reveal more promising measurement invariance results. Anchoring vignettes or similar approaches to control for differential item functioning could help to adjust data and to improve their comparability, which should also improve the results of measurement invariance analysis.
This session wants to provide a platform for survey methodological evidence to improve measurement comparability. The aim is to foster a discussion on survey methodological approaches to improve data comparability evaluated by measurement invariance analysis before, during, or after the data has been collected.

Keywords: Measurement Invariance, Survey Methodology, Comparability


Assessing Potential Measurement Error Inequities in US Household Surveys of Health using Item Response Theory and Linear Regression Trees

Dr Morgan Earp (US National Center for Health Statistics) - Presenting Author
Dr Lauren Rossen (US National Center for Health Statistics)
Dr Kristen Cibelli Hibben (US National Center for Health Statistics)

Measurement equity has important implications for survey outcomes, such as health, especially as it pertains to racial and ethnic disparities. It is important to understand when health estimates are biased or may be subject to differential measurement error as this can distort (either exacerbating or concealing) health inequities. We used Item Response Theory (IRT) to assess differential item functioning of scale items and linear regression trees to compare self-reported versus lab measurements of chronic conditions. IRT is commonly used in psychology and education to evaluate the performance of response scales across subgroups. By modeling the strength of the relationship between item response options and the latent construct (health), IRT compares an item’s ability to distinguish between people at varying levels of latent health and how this may different across race and ethnicity. Using IRT and data from the National Health Interview Survey (NHIS) we examined the measurement properties (including information and distribution) of health items using response scales. Linear regression trees have been used to compare differential respondent burden and nonresponse propensities, controlling for linear effects of continuous or binary variables. Using linear regression trees, we identified demographic subgroups that tended to exhibit higher measurement error between self-reported and measured chronic conditions, using data from the National Health and Nutrition Examination Survey (NHANES). The combination of IRT and linear regression trees allowed us to assess differential measurement error in order to determine if measurement biases were consistent across race and ethnicity.

Integrating Mixture-Modelling Results with Qualitative Evidence from Cognitive Interviewing: Uncovering Classes in Mental Status Module of the European Health Interview Survey in Spain

Dr Irene Gómez-Gómez (Universidad Loyola Andalucía)
Dr Isabel Benítez (University of Granada)
Dr José-Luis Padilla (University of Granada) - Presenting Author
Dr Andrés González (University of Granada)

Patient Health Questionnaire (PHQ) versions are intended to screen major depression in primary health care. The 2020 European Health Survey includes PHQ-8 to estimate the prevalence of depression in country populations. The changes in the uses of PHQ measures: from screening to estimating prevalence from survey population, and in the administration contexts: from health care settings to homes of survey respondents, involve the challenges of obtaining validity evidence to support data quality inference. Mixture modelling can help in identifying uncovered classes of survey participants with different response patterns, taking theoretically relevant covariates, identifying invariant items across classes, analysing the predictive values of these classes and covariates on important criteria. The aim of this study is to integrate Latent Class Analysis (LCA) and qualitative evidence from cognitive interviews to identify and understand uncovered classes of survey respondent to the 2020 Spanish EHS, as well as identify invariant items. In the quantitative phase, we include “gender”, “age”, and “educational level” as covariates in the LCA model, along with diabetes health condition, and intake of antidepressant drugs as outcome. Having removed “Don’t know” and “Non response” to the PHQ-8 items, we analysed responses of 21254 Spanish respondents from 15 years old in a cross-validation study. Decision on the best models was based on several fit statistics. Classes are used in the qualitative phase to plan the recruitment and to design the protocol for cognitive interviewing. We will present the outline and preliminary results of a mixed-methods research on how to integrate LCA results and Cognitive Interviewing (CE) qualitative evidence for helping in making decisions on the best LCA models, identify invariant items, and in understanding pattern response processes.

Language choice and measurement effects in multilingual surveys

Dr Julian Aichholzer (Institute for Empirical Social Studies (IFES)) - Presenting Author
Dr Eva Zeglovits (Institute for Empirical Social Studies (IFES))
Dr Reinhard Raml (Institute for Empirical Social Studies (IFES))

Many surveys deliberately consider different language skills of the country’s dominant language among the target population. Hence when designing a survey, one might offer alternative languages for completing the survey. Obviously, this possibility is vital whenever migrant groups are regarded a vital population of interest. Yet, it is not always clear how many respondents and which specific groups will actually prefer another language instead as well as how it impacts measurement quality overall.

In this study, we, first, look at language choice in the context of Austria, a country with “older” migrant communities (such as Turkey and former Yugoslavia) and “newer” ones (e.g., from Syria or Afghanistan). Second, we investigate how the language chosen affects indicators of measurement quality, such as the proportion of don’t knows, reliability of scales, measurement invariance and, eventually, the study’s substantive results. For this purpose, we make use of multi-topic surveys that employ unique migrant samples conducted in Austria, looking at various language groups.

The study’s results contribute to our understanding of how survey language impacts substantive results, on the one hand. On the other hand, it may inform cost-benefit analyses that guide survey planning. From this point of view, researchers and survey sponsors alike have to decide whether or not translation of the questionnaire is desirable or feasible as well as how this choice might change the outcome.

Examining the Performance of Self-Rated Health Functioning and Measurement Equity in the United States Using Item Response Theory

Dr Kristen Cibelli Hibben (US National Center for Health Statistics) - Presenting Author

Self-rated health (SRH) is a widely used measure of respondents’ subjective evaluation of their health status. Defined as an individual’s perceived overall health, extensive research has demonstrated its utility and led to its widespread use in major surveys and in medical, social, and behavioral science research using survey data. SRH is frequently used comparatively and to examine disparities across key respondent characteristics and population subgroups (e.g., sex, education, race/ethnicity, etc.). However, previous qualitative research has found evidence of sociodemographic subgroup differences in the types of health factors considered and variation in the use of response options. Much existing research on the measurement properties and utility of SRH has been methodologically limited, focusing on English-speaking and or culturally homogeneous groups or small sample sizes. Drawing on data from the 2021 National Health Interview Survey (NHIS) and building on our prior work using Item Response Theory (IRT) to examine SRH performance, in the current study we compare SRH measurement performance among the US population across key demographic subgroups including education, sex, and race/ethnicity. By modeling the strength of the relationship between responses and the latent construct (for example, health), IRT makes it possible to compare an item’s ability to distinguish between people at varying levels on the underlying construct and how this might differ across key subgroups. This study also examines differences by language of the interview and acculturation, which have been shown to affect SRH responses. We examine whether SRH provides more evenly distributed or more skewed measures of health across subgroups and the likelihood that measurement error is more or less consistent across demographic subgroups and implications for subject-area research using SRH measures to examine inequities.