ESRA 2023 Program

All time references are in CEST

Opportunities and Challenges in Dealing with Selection Bias in Cross-sectional and Longitudinal Surveys 1
Session Organisers	Professor Sabine Zinn (Socio-Economic Panel at DIW ) Dr Jason M. Fields (U.S. Census Bureau) Dr Hans Walter Steinhauer (Socio-Economic Panel at DIW)
Time	Thursday 20 July, 09:00 - 10:30
Room	U6-02

Analysing survey data usually also means coping with selection bias. There are proven and well-established strategies for doing so, such as survey weighting or selection modelling. However, still many data users struggle in understanding how to apply these strategies, especially when confronted with the diversity of the information given by the survey providers. Beyond that, increasingly researchers use machine learning and Bayesian statistics in survey data analysis. This is also true for conducting and controlling surveys. Specifically, adaptive contact or motivational strategies are designed for upcoming survey studies or waves based on response processes observed in previous surveys or survey waves. The estimation of population statistics is improved by including information about the entire selection process in the statistical model, both developing these methods and communicating their use are critical.
In this session, we welcome research on novel approaches and strategies to ease data users understanding of how to handle selection bias in their statistical analysis. This research might cover:
-Methods for easing, and communicating, the appropriate use of weights or other methods for addressing selection biases in published microdata files. These may include, but are not limited to, longitudinal weights, calendar year weights, replicate weights, multiple implicates, and other tools to improve the population representativeness and communication of uncertainty in public data products.
-Novel methods to assess and adjust for sources of bias in cross-sectional and longitudinal surveys, including, but not limited to, machine learning interventions, adaptive design, post-hoc weighting calibrations, informed sampling, etc. How are these communicated to data users? How are they adapted as response and biases change?
-Papers are encouraged that investigate the selection processes, papers that leverage novel modelling strategies for coping with selection bias in statistical analysis, and papers that include examples of modelling non-ignorable selection bias in substantive analysis.

Keywords: Selection bias, weighting, adaptive designs, non-ignorable selection, weighting

Papers

Longitudinal Nonresponse Prediction and Bias Mitigation with Machine Learning

Mr John Collins (University of Mannheim) - Presenting Author
Dr Christoph Kern (University of Muenchen)

We explore the application of predictive modeling to the amelioration of non-response bias in longitudinal surveys. While panel surveys are an irreplaceable source for social science researchers, non-response can lead to significant loss of data quality. To prevent bias and attrition, researchers have turned to predictive modeling to identify at-risk participants and support early interventions. In particular, machine learning (ML) approaches have shown promising results for predicting participant non-response. However, minimizing non-response does not necessarily minimize non-response bias. In this project, we compare prediction methods both with respect to performance and their potential to reduce bias based on simulated interventions. First, we generate sets of ML-based nonresponse predictions. Next, we study the downstream bias implications of different treatment regimes that draw on these predictions. Our experiments demonstrate how prediction-based interventions may affect the composition of respondents and thereby may minimize non-response bias, instead of only non-response rates. We conduct our simulations using data from the GESIS Panel, a large-scale probability-based German panel study. From this investigation, we provide survey practitioners with insights on how ML-based adaptive survey designs may mitigate non-response bias.

Case Prioritization in a Panel Survey Based on Predicting Hard to Survey Households by Machine Learning Algorithms: An Experimental Study

Dr Jonas Beste (Institute for employment research) - Presenting Author
Dr Corinna Frodermann (Institute for employment research)
Professor Mark Trappmann (Institute for employment research)
Dr Stefanie Unger (Institute for employment research)

Panel surveys provide particularly rich data for implementing adaptive or responsive survey designs. Not only are data from the current wave fieldwork available, but paradata and survey data as well as interviewer observations from all previous waves can be utilized to predict fieldwork outcomes in an ongoing wave.
In the German panel survey “Labour Market and Social Security”, a sequential mixed-mode survey of the general population that oversamples welfare benefit recipients, an adaptive survey design has until now primarily been implemented for refreshment samples.
As panel attrition is increasing, also panel cases that are at greater risk of attrition were targeted in the 14th wave in 2020 and prioritized in the fieldwork. Prioritization included increased respondent incentives and an increased interviewer premium.
In order to select panel households to be prioritized, we first used data (survey data, paradata, interviewer observations) from wave 4 to 12 of the panel to train different machine learning algorithms. In a next step we used the parameters from this training to predict wave 13 response. The quality of this prediction was assessed by comparison to wave 13 fieldwork outcomes and the best performing algorithm was used to finally predict wave 14 response based on data from waves 4 to 13. The adaptive design was implemented experimentally on roughly half of the panel cases with estimated response propensities in the lower half of the distribution.
In the presentation, we show which algorithm worked best in our setting to predict response propensities and how well these propensities predicted actual wave 14 outcomes. Furthermore, we demonstrate that panel attrition for high risk groups can be reduced by case prioritization. Reduction in bias on substantial variables, however, is very moderate at best.

Optimizing adaptive survey design accounting for mode-specific measurement bias: A case study on the Dutch Health Survey

Mr Hamza Ahmadan (Researcher at Statistics Netherlands )
Dr Kees van Berkel (Researcher at Statistics Netherlands )
Dr Nino Mushkudiani (Researcher at Statistics Netherlands ) - Presenting Author
Dr Barry Schouten (Researcher at Statistics Netherlands)

Adaptive survey designs usually focus at a balanced representation. This is a natural choice, because survey costs are directly related to efforts to obtain survey response. Doing so, measurement differences between candidate design features are ignored. When the survey mode is one of the design features, then the adaptive survey design optimization strategy can no longer ignore measurement. Measurement equivalence and comparability may be sacrificed at the cost of representation.

In order to account simultaneously for representation and measurement differences their impact must be isolated and estimated. The confounding of the two errors is inherently hard, so that advanced experimental designs and estimation strategies are needed. These are costly and conflict with the objective of adaptive survey designs to be efficient.

Statistics Netherlands performs multiple surveys in which health topics play an important role. Given that large mode effects are observed for health statistics, it was decided to conduct a re-interview experiment. This paper describes the experimental design, the estimated mode-specific selection and measurement biases and elaborate on possible optimization strategies accounting for both errors.

ESRA 2023 Program

Opportunities and Challenges in Dealing with Selection Bias in Cross-sectional and Longitudinal Surveys 1

Papers

Longitudinal Nonresponse Prediction and Bias Mitigation with Machine Learning

Case Prioritization in a Panel Survey Based on Predicting Hard to Survey Households by Machine Learning Algorithms: An Experimental Study

Optimizing adaptive survey design accounting for mode-specific measurement bias: A case study on the Dutch Health Survey