ESRA 2017 Programme

Tuesday 18th July      Wednesday 19th July      Thursday 20th July      Friday 21th July     

     ESRA Conference App

Wednesday 19th July, 11:00 - 12:30 Room: N AUD4

Adaptive and Responsive Designs in Complex Surveys: Recent Advancements and Challenges 2

Chair Mr Jason Fields (U.S. Census Bureau )
Coordinator 1Dr Asaph Young Chun (U.S. Census Bureau)
Coordinator 2Professor James Wagner (University of Michigan)
Coordinator 3Dr Barry Schouten (Statistics Netherlands)
Coordinator 4Ms Nicole Watson (University of Melbourne)

Session Details

Adaptive and responsive survey designs (Groves and Heeringa, 2006; Wagner, 2008) have attempted to respond to a changing survey environment that has become increasingly multimode, multiple data sources-driven and multilingual. The Journal of Official Statistics will be publishing in 2017 a Special Issue on Adaptive Design in complex surveys and censuses (Edited by Chun, Schouten and Wagner, forthcoming). In our efforts to address multiple challenges affecting the survey community and the fundamental interest of the community of survey methodologists to produce quality data, we propose a session of papers that discuss the latest methodological solutions and challenges in adaptive and responsive designs for complex surveys. We encourage submission of papers on the following topics of adaptive or responsive design:

1.Applied and theoretical contributions, and comparisons of variants of adaptive design that leverage strengths of administrative records, big data, census data, and paradata. For instance, what cost-quality tradeoff paradigm can be operationalized to guide development of cost and quality metrics and their use around the survey life cycle? Under what conditions can administrative records or big data be adaptively used to supplement survey data collection and improve data quality?

2.Papers addressing the following triple drivers of adaptive/responsive design: cost, respondent burden, and data quality. For instance, what indicators of data quality can be integrated to monitor the course of the data collection process? What stopping rules for data collection can be used across the phases of a multi-mode survey?

3.Papers involving longitudinal survey designs where data collection systems need to fulfill their panel focus and provide data for the same units over time, and leverage adaptive processes to reduce cost, reduce burden, and/or increase quality. For instance, how can survey managers best engage the complexity of issues around implementing adaptive and responsive designs, especially for panel surveys that are in principle focused on measuring change over time? How are overrepresented or low priority cases handled in a longitudinal context?

4.Papers involving experimental designs or simulations of adaptive survey design. For instance, experimental implementation of an adaptive design, especially those involving multiple data sources, a mixed mode of data collection or a cross-national design.

5.Papers that apply Bayesian methods to build adaptive designs. For example, adaptive designs where the design parameters are given priors and then updated as additional data are collected.

Paper Details

1. Statistics Canada’s Experiences in Using Paradata to Manage Responsive Collection Design CATI household surveys
Mr Francois Laflamme (Statistics Canada)

Over the past decade, paradata research has focused on identifying strategic data collection improvement opportunities that could be operationally viable and lead to improvements in quality or cost efficiency. To that extent, Statistics Canada has developed and implemented a Responsive Collection Design (RCD) strategy for Computer-Assisted Telephone Interview (CATI) household surveys to maximize quality and potentially reduce costs. RCD is an adaptive approach to survey data collection that uses information available prior to and during data collection to adjust the collection strategy for the remaining in-progress cases. In practice, the survey managers monitor and analyze collection progress against a pre-determined set of indicators for two purposes: to identify critical data collection milestones that require significant changes to the collection approach and to adjust collection strategies to make the most efficient use of remaining available resources. In the RCD context, numerous considerations come into play when determining which aspects of data collection to adjust and how to adjust them. Paradata sources play a key role in the planning, development and implementation of active management for RCD. Since 2009, Statistics has conducted several RCD surveys. This paper describes Statistics Canada’s experiences in implementing, and especially, in monitoring these surveys. In particular, this paper presents the plans, tools and strategies used to actively manage the RCD surveys and how these strategies evolved and improved over time.

2. Interactive Adaptive Total Design Reports for Near Real-Time Survey Monitoring
Mr Joe Murphy (RTI International)
Dr Paul Biemer (RTI International)
Mr Michael Duprey (RTI International)
Mr Rob Chew (RTI International)

Adaptive Total Design (ATD) is a survey management strategy that aims to minimize total error simultaneously across multiple sources during the course of data collection within budget constraints. To implement ATD, a range of stakeholders including survey managers, statisticians, data collection leaders, field management, and clients need to access critical to quality (CTQ) indicators on a real-time basis during data collection. These CTQs, based on survey data and paradata, in turn inform decisions which may involve switching to a more-promising protocol, halting the pursuit of some or all cases, initiating additional contact attempts, or beginning other interventions. Even for studies with limited interventions, monitoring a range of actual data collection outcomes compared to expectations is critical to successful survey management.

This presentation summarizes our work developing a system for collecting and monitoring ATD data from multiple disparate systems to track in near-real time an experimental, multi-mode design. The system produces daily graphs and metrics that are made available to the internal project and client teams. Using industry-accepted best practices of visual design, the most important CTQs are highlighted and clearly displayed while superfluous information is minimized. These resources are instrumental in interpreting and communicating the results and timely decision-making during the course of the data collection. The charts allow the team to stay “on the same page” and provide a roadmap for management.

The previous version of this system produced static charts that were the generated by SAS and Excel routines. By employing new software options, we developed an improved system with interactive charts that can be generated and automatically published to a project website. Using the Shiny web application framework for R, the ATD reports put a powerful toolset in the hands of project staff by allowing user-interactivity and customization. By standardizing the approach to and production of high-quality usable reports, this tool can benefit many projects and avoids having to “reinvent the wheel” with data collection monitoring. A range of graphical options are available to meet the unique needs of each project. These options allow project staff to begin monitoring costs and quality from the beginning of data collection—a crucial period in the management of any project—with minimal investment of time.

In the presentation, we will describe the process of designing a suite of ATD visualizations and their use on an ongoing longitudinal survey – The Longitudinal Survey of Adolescent Health (Add Health) – that employs web, mail, phone, and field modes.

3. When to stop calling? Dataset representativeness during data collection: An assessment using linked 2011 UK Census data
Professor Gabriele Durrant (University of Southampton)
Dr Jamie Moore (University of Southampton)
Professor Peter Smith (University of Southampton)

To improve survey data collection survey researchers need to monitor and assess the quality of the incoming data during data collection – for face-to-face household surveys this is during calls or visits to a household. We consider the use of representativeness indicators to monitor risks of non-response bias during survey data collection. A particular emphasise will be on the specification of phase capacity or stopping points, that inform survey researchers when to stop calling or when to change a data collection strategy. A range of stopping rules are discussed, including one on significance of regression estimates. To assess dataset representativeness a number of methods are used including R and CV indicators.
The analysis benefits from use of a unique dataset linking call record paradata and survey target variables from three UK social surveys to data from the UK 2011 Census (The Census nonresponse link study). The unique data allows us to make comparisons between survey and fully-observed census variables. This includes in addition to household level variables, which were discussed in a previous paper, a range of individual level survey target variables that are available in both the survey(s) and the census. This relatively unique type of data allows us to make informed decisions about when to stop calling in the presence of fully-observed (census) variables but also when such information is not available, as is the case for most survey settings. Given our findings, we offer guidance to survey practitioners on the use of such methods and implications for optimising data collection and efficiency savings.

4. To Push or Not to Push: Tailoring Response Mode to Individual Respondents
Ms Cameron McPhee (American Institutes for Research)
Mr Michael Jackson (American Institutes for Research)

As mixed mode survey designs become increasingly important for survey research, due to their potential to reach a diverse respondent pool while controlling cost, researchers have begun experimenting with a variety of strategies to determine which mix of modes is the most efficient method to gain response. Experimentation with adaptive and responsive designs has begun to explore how different modes can be leveraged for different respondents to improve response rates and representativeness. However, limited research has been done thus far to validate methods of predicting, in advance of the survey administration, the mode by which individual cases are most likely to respond. It is known, for example, that some respondents are more likely or able than others to respond to a web survey, while others would be unlikely to respond to a web survey but more likely to respond if sent a paper survey, while others are unlikely to respond regardless of the mode. Understanding individuals’ propensity to respond by a particular survey mode in advance could potentially reduce survey administration costs by allowing the mode of response to be tailored to the individual respondent as early in the administration period as possible, freeing up resources to leverage other modes or incentives on the more difficult-to-reach sample cases.

This paper examines the potential to predict mode-specific response propensity for the National Household Education Survey (NHES). The 2016 administration of the NHES included a mixed-mode experiment in which a randomly assigned set of cases (n=35,00) were asked first to complete the survey by Web. Cases that did not respond by web after several contact attempts were sent a paper questionnaire. This research aims to determine whether household-level data available on the address-based sampling frame, and/or geographic data available from the Census Bureau, can be used to accurately predict the mode by which individual cases are most likely to respond. The ability to predict the mode of response could allow the efficiency of future administrations to be improved by sending some cases a paper questionnaire with the first mailing, rather than waiting until the third mailing to offer the paper option.

Parametric models (e.g. multinomial logistic regression) and non-parametric algorithms (e.g. classification and regression trees) will be compared with respect to their ability to predict response status and the mode of response using the available auxiliary data. Cross-validation procedures will be used to evaluate each method’s robustness when used to predict the mode of response in out-of-sample data. The paper will describe the auxiliary variables available for use in modeling, the variable selection procedures used to determine the optimal specification for the multinomial logistic regression model, and each method’s predictive accuracy when applied to the NHES dataset. The results will provide initial insight into whether it is possible to improve the efficiency of sequential mixed-mode designs by tailoring the mode of initial contact based on information known about sampled cases prior to data collection.