All time references are in CEST
Measurement Error: Factors Influencing the Quality of Survey Data and Possible Corrections 3
|Session Organisers|| Dr Lydia Repke (GESIS - Leibniz Institute for the Social Sciences)
Ms Fabienne Krämer (GESIS - Leibniz Institute for the Social Sciences)
Dr Cornelia Neuert (GESIS - Leibniz Institute for the Social Sciences)
|Time||Thursday 20 July, 09:00 - 10:30|
High-quality survey data are the basis for meaningful data analysis and interpretation. The choice of specific survey characteristics (e.g., mode of data collection) and instrument characteristics (e.g., number of points in a response scale) affects data quality, meaning that there is always some measurement error. There are several methods and tools for estimating these errors (e.g., the Survey Quality Predictor) and approaches for correcting them in data analysis. This session will discuss factors that influence data quality, methods or tools for estimating their effects, and approaches for correcting measurement errors in survey data.
We invite papers that
(1) identify and discuss specific survey characteristics and their influence on data quality;
(2) identify and discuss specific instrument characteristics and their impact on data quality;
(3) discuss methods of estimating measurement errors and predicting data quality;
(4) present or compare tools for the estimation or correction of measurement errors;
(5) show how one can account for and correct measurement errors in data analysis.
Keywords: measurement error, data quality, correction, survey characteristics, item characteristics
Mr Daniel Schubert (Ruhr-University Bochum) - Presenting Author
The German General Social Survey (ALLBUS) 2016 contains two questions on preference of foreigners in the neighbourhood measured with a vignette-like design of 13 residential areas with different, continuously increasing proportions of foreigners (by about 8 percentage points per step). Respondents were asked to indicate all neighbourhoods they would like to live in and all neighbourhoods they would not live in at all. However, an unexpectedly high proportion of respondents named only two non-adjacent (in fact widely separated) residential area each in response to both sub-questions. This indicates a kind of measurement error that needs to be fixed prior to data analysis.
In this paper, I present a theory-based method for correcting the measurement errors. Theoretical arguments for correction are based, on the one hand, on the assumption of Schelling's segregation model (2006) that minority positions are rejected or that higher proportions of foreign groups in the residential environment are rejected more strongly will be tested. Due to general acceptance of diversity on the other hand (Drouhot et al. 2021, Petermann & Schönwälder 2014, Schönwälder et al. 2016), it can also be expected that strongly homogeneous residential environments will be rejected. These arguments can be used to estimate which neighbourhoods are likely to be preferred, even if only one neighbourhood has been named.
For this purpose, the corresponding characteristics on preferred and rejected residential environment compositions of the ALLBUS from 2016 are evaluated (GESIS 2017). It must be emphasised here that the elaborately surveyed residential preferences of the ALLBUS 2016 have not yet been evaluated and published. Only Friedrichs and Triemer (2009: 72) refer to a rough descriptive statement.
Dr Isabelle Schmidt (GESIS – Leibniz Institute for the Social Sciences) - Presenting Author
Dr Clemens M. Lechner (GESIS – Leibniz Institute for the Social Sciences)
Lengthy online surveys in the social sciences quickly lead to high respondent burden and impaired data quality. In recent years, therefore, survey designs with planned missing values (multiform designs; MD) have come more into research focus in survey research because they allow savings in the length of a survey without sacrificing the specification of all required items in a survey.
Previous research using online surveys with non-experimental designs suggests that the use of a MD (e.g., three-form design) may have potential positive effects on data quality indicators. However, these studies do not allow for causal conclusions. To contribute to close this gap, we conducted an online survey experiment to test how using a MD (compared to a traditional design) impacts data quality. To assess data quality, we used model-based (discriminatory power of items, reliability of a scale) and respondent-based indicators (item nonresponse, straightlining, Mahalanobis distance)
The online survey (CASI) was conducted in Germany in 2021 (N=1,008; quota sample according to the 2017 German Census conducted by Bilendi/RespondiAG). The survey consists of scales to capture psychological characteristics. Participants were randomly assigned to one of two conditions: (1) a traditional complete design in which all respondents answered all items and (2) a three-form design.
We found slightly better discriminatory power of items and reliability of a scale in the MD compared to the traditional design. The number of item nonresponse, Mahalanobis distance, and straightlining suggested worse data quality in the traditional design compared to the MD.
To conclude, MD tends to have data quality benefits. The results suggest that MD has advantages for respondent-based indicators, but no clear advantage for model-based indicators. Further research should clarify the benefit of PMD investigating different data quality indicators.
Dr Wiebke Weber (LMU Munich) - Presenting Author
Dr Barbara Felderer (GESIS)
The main challenge for questionnaire designers is to create survey measurement instruments that capture the true responses from the population, minimizing the measurement errors. While there is expert knowledge and standard procedures, there is still not enough empirical evidence for the influence of all the different possibilities questionnaire designers have when creating a survey item.
In this study we use the data from seven rounds of Multitrait-Multimethod (MTMM) experiments ran in the European Social Survey to shed light on the effect of response scale characteristics (number of response categories; presentation on visual aid, horizontal or vertical presentation; labels, fixed reference point; order of response categories) and characteristics of the request for an answer (formulation, type, use of gradation, and presentation in a battery) on both the reliability and validity of survey questions.
Our data set includes almost 5,000 survey items, their question characteristics as well as their reliability and validity estimates from 22 MTMM experiments that have been fielded in 28 languages. Our analysis accounts for the hierarchical structure of the data where survey items are nested in experiments and languages. Applying regression methods to the experimental data set allows us to estimate causal effects of each question characteristic while controlling for the others.
Dr Diego Cortes (IEA) - Presenting Author
Dr Sabine Meinck (IEA)
Dr Dirk Hastedt (IEA)
International large-scale assessments in education (ILSA) are built upon two design features, both of which are based on the principles of randomization. First, members of the study population are included into the survey following a complex sampling design, typically in the form of a two-stage stratified random sample (Meinck et al., 2021). This random mechanism generating data allows analysts to make inferences about the entire study population from its sampled members. Second, the set of students participating in ILSA is assessed through a matrix sampling of items, in which test booklets are randomly rotated across students (von Davier et al., 2020). This is because the framework used to assess a subject domain is typically extensive; therefore, the pool of items needed to measure it is typically large. Hence, to prevent overburdening, students participating in these surveys are exposed to only a fraction of the overall item pool. For example, TIMSS 2019 contains about 10.5 hours testing time for grade eight, however the actual testing time for each student was limited to 90 minutes (Mullis & Martin, 2017). This random mechanism generating student item responses allows analysts to make inferences about the distribution of proficiency in a subject domain within a group of students.
In this paper, we examine the problem of statistical inference arising from the uncertainty in the estimation generated by these two design features using the design effect framework pioneered by Kish (1965). This framework allows us to highlight that, in the context of ILSA, design effects are not only subject to the sampling and assessment plan, but also to the estimator used. Our results are relevant in that they showcase the caution analysts should take when generalizing design effects in ILSA across populations and across estimators.