ESRA logo

ESRA 2023 Glance Program

All time references are in CEST

Measurement Error: Factors Influencing the Quality of Survey Data and Possible Corrections 3

Session Organisers Dr Lydia Repke (GESIS - Leibniz Institute for the Social Sciences)
Ms Fabienne Krämer (GESIS - Leibniz Institute for the Social Sciences)
Dr Cornelia Neuert (GESIS - Leibniz Institute for the Social Sciences)
TimeThursday 20 July, 09:00 - 10:30
Room U6-22

High-quality survey data are the basis for meaningful data analysis and interpretation. The choice of specific survey characteristics (e.g., mode of data collection) and instrument characteristics (e.g., number of points in a response scale) affects data quality, meaning that there is always some measurement error. There are several methods and tools for estimating these errors (e.g., the Survey Quality Predictor) and approaches for correcting them in data analysis. This session will discuss factors that influence data quality, methods or tools for estimating their effects, and approaches for correcting measurement errors in survey data.

We invite papers that
(1) identify and discuss specific survey characteristics and their influence on data quality;
(2) identify and discuss specific instrument characteristics and their impact on data quality;
(3) discuss methods of estimating measurement errors and predicting data quality;
(4) present or compare tools for the estimation or correction of measurement errors;
(5) show how one can account for and correct measurement errors in data analysis.

Keywords: measurement error, data quality, correction, survey characteristics, item characteristics


Implementations of vignettes about neighbourhood composition

Mr Daniel Schubert (Ruhr-University Bochum) - Presenting Author

The German General Social Survey (ALLBUS) 2016 contains two questions on preference of foreigners in the neighbourhood measured with a vignette-like design of 13 residential areas with different, continuously increasing proportions of foreigners (by about 8 percentage points per step). Respondents were asked to indicate all neighbourhoods they would like to live in and all neighbourhoods they would not live in at all. However, an unexpectedly high proportion of respondents named only two non-adjacent (in fact widely separated) residential area each in response to both sub-questions. This indicates a kind of measurement error that needs to be fixed prior to data analysis.
In this paper, I present a theory-based method for correcting the measurement errors. Theoretical arguments for correction are based, on the one hand, on the assumption of Schelling's segregation model (2006) that minority positions are rejected or that higher proportions of foreign groups in the residential environment are rejected more strongly will be tested. Due to general acceptance of diversity on the other hand (Drouhot et al. 2021, Petermann & Schönwälder 2014, Schönwälder et al. 2016), it can also be expected that strongly homogeneous residential environments will be rejected. These arguments can be used to estimate which neighbourhoods are likely to be preferred, even if only one neighbourhood has been named.
For this purpose, the corresponding characteristics on preferred and rejected residential environment compositions of the ALLBUS from 2016 are evaluated (GESIS 2017). It must be emphasised here that the elaborately surveyed residential preferences of the ALLBUS 2016 have not yet been evaluated and published. Only Friedrichs and Triemer (2009: 72) refer to a rough descriptive statement.

Question characteristics and their effect on measurement quality. A meta-analysis using seven rounds of the European Social Survey.

Dr Wiebke Weber (LMU Munich) - Presenting Author
Dr Barbara Felderer (GESIS)

The main challenge for questionnaire designers is to create survey measurement instruments that capture the true responses from the population, minimizing the measurement errors. While there is expert knowledge and standard procedures, there is still not enough empirical evidence for the influence of all the different possibilities questionnaire designers have when creating a survey item.

In this study we use the data from seven rounds of Multitrait-Multimethod (MTMM) experiments ran in the European Social Survey to shed light on the effect of response scale characteristics (number of response categories; presentation on visual aid, horizontal or vertical presentation; labels, fixed reference point; order of response categories) and characteristics of the request for an answer (formulation, type, use of gradation, and presentation in a battery) on both the reliability and validity of survey questions.

Our data set includes almost 5,000 survey items, their question characteristics as well as their reliability and validity estimates from 22 MTMM experiments that have been fielded in 28 languages. Our analysis accounts for the hierarchical structure of the data where survey items are nested in experiments and languages. Applying regression methods to the experimental data set allows us to estimate causal effects of each question characteristic while controlling for the others.

Appraising the sampling and assessment designs in ILSAs: A design effect standpoint

Dr Diego Cortes (IEA) - Presenting Author
Dr Sabine Meinck (IEA)
Dr Dirk Hastedt (IEA)

International large-scale assessments in education (ILSA) are built upon two design features, both of which are based on the principles of randomization. First, members of the study population are included into the survey following a complex sampling design, typically in the form of a two-stage stratified random sample (Meinck et al., 2021). This random mechanism generating data allows analysts to make inferences about the entire study population from its sampled members. Second, the set of students participating in ILSA is assessed through a matrix sampling of items, in which test booklets are randomly rotated across students (von Davier et al., 2020). This is because the framework used to assess a subject domain is typically extensive; therefore, the pool of items needed to measure it is typically large. Hence, to prevent overburdening, students participating in these surveys are exposed to only a fraction of the overall item pool. For example, TIMSS 2019 contains about 10.5 hours testing time for grade eight, however the actual testing time for each student was limited to 90 minutes (Mullis & Martin, 2017). This random mechanism generating student item responses allows analysts to make inferences about the distribution of proficiency in a subject domain within a group of students.
In this paper, we examine the problem of statistical inference arising from the uncertainty in the estimation generated by these two design features using the design effect framework pioneered by Kish (1965). This framework allows us to highlight that, in the context of ILSA, design effects are not only subject to the sampling and assessment plan, but also to the estimator used. Our results are relevant in that they showcase the caution analysts should take when generalizing design effects in ILSA across populations and across estimators.