ESRA 2019 Programme at a Glance

Data Quality in Opt-In Panels

Session Organiser Dr Glenn Israel (University of Florida)
TimeTuesday 16th July, 11:00 - 12:30
Room D26

This session includes papers that assess the quality of data collected in non-probability based, 'opt-in' panel surveys, particularly in comparison with probability-based surveys.

Keywords: non-probability panels, opt-in, comparison with probability-based surveys

Comparisons of Probability and Nonprobability Samples

Dr Mansour Fahimi (Ipsos) - Presenting Author
Dr Frances Barlas (Ipsos)

There have been several studies comparing the quality of survey estimates from various forms of probability and nonprobability samples. In particular, Yeager et al (2011) conducted parallel surveys with an identical instrument using different sample types. Their key findings included the fact that the “organic” representation inherent to probability-based samples may not be manufacturable via nonprobability samples, particularly those generated from online opt-in panels. Equally important to the quality of survey data, vis-à-vis the external validity of point estimates, is the heftier volatility of the results from nonprobability samples.

Relying on two probability-based samples, one DFRDD and one from KnowledgePanel®, as well as a third sample secured from a mix of opt-in panels, we have administered an identical questionnaire that includes a long list of benchmark-able measures. The resulting data have provided a rich analytical framework for reassessing a number of critical hypotheses regarding the fundamental role that sample representation plays in the quality of survey estimates, both in terms of bias and variance.

During this presentation we will discuss the need for reassessment of the Nyman (1934) paradigm, upon which inferential sample surveys are built, particularly in light of the growing coverage issues and falling response rates for probability samples. Yet, nonprobability samples have a long way to go to be recognized as viable options as some of the traditional methods begin to lose credibility. As such, we will discuss the concepts of regimented sampling and dynamic calibration adjustments as refinements that can improve the inferential integrity of nonprobability sampling alternatives.

Lessons Learned from Conducting Concurrent Surveys Using an Online Opt-in Quota Sample and a Mail/Mixed-Mode Address-Based Sample

Dr Glenn Israel (University of Florida) - Presenting Author

The challenge for researchers and policy-makers is to collect accurate and representative information in a cost-effective manner. Online surveys using opt-in panels are a fast, cost-effective way to collect data but questions remain about the amount of error in survey estimates and best practices for adjusting these estimates. There is growing evidence that opt-in panels used for nonprobability samples in online surveys are less accurate than probability-based sample surveys (Baker et al., 2010; Dutwin & Buskirk, 2017; MacInnis et al., 2018; Mercer et al., 2018; Yeager et al., 2011). This study was conducted using address-based probability sample as a benchmark for a nonprobability quota sample for concurrent surveys on the topic of climate change. Address-based samples currently represent the gold standard for coverage of the U.S. population. The benchmark data was collected using a mail and web/mail mixed-mode survey (Dillman et al., 2014). The quota sample was obtained by contracting with a survey services vendor. The instruments was constructed using a unified-modal design to provide a consistent stimulus to the two samples and administered concurrently. The study compares 1) survey yield and data collection duration, 2) item responses rates, 3) response distribution after post-stratification weighting, and 4) relationships between selected variables. As reported by many researchers, the online opt-in quota sample yielded the contracted set of responses in just a week while the ABS mail/mixed-mode survey took five months. Cooperation and response rates, respectively, were low. Analysis of data quality indicators raised “red flag” issues for both methods and there were substantial differences in response distributions before and after weighting. As this study demonstrates, researchers will need to carefully weigh the strengths and weaknesses to arrive at the best “fit for purpose” methodology. The findings provide further evidence regarding the accuracy of nonprobability samples.

Using Paradata and Rich Sampling Frame Data to Assess Nonresponse in Non-Probability Online Panels. The Italian Case.

Miss Chiara Respi (University of Milano-Bicocca) - Presenting Author

The increasing use of non-probability online panels (NPOPs) in survey research poses a number of methodological issues, mainly due to self-selection. Research on the nature and magnitude of nonresponse at the specific study stage in NPOPs is scant.

The overall aim of this paper is to explore the impact of nonresponse (at the specific study stage) on the quality of data from the Italian NPOP In particular, i) I describe the nonresponse process in the panel survey, documenting the quality of the response process, and ii) I address nonresponse bias in the panel survey estimates, investigating the magnitude of nonresponse bias and the impact of the socio-demographic characteristics on response propensity.

I use paradata (i.e. final disposition codes), rich sampling frame data (i.e. information on all registered panellists), and data from a survey conducted on the Italian panel, that is a NPOP established in 2011 (see also I compute a number of response metrics, including cooperation rate, break-off rate, and refusal rate. Moreover, I compare the respondents’ socio-demographic characteristics with those of nonrespondents, calculating a number of nonresponse measures (e.g. the percentage point differences, the nonresponse bias, and the mean percentage absolute relative bias), and running a logistic regression model to estimate the probability of responding to the panel survey.

Key findings are that: i) response to the panel survey is high, ii) the socio-demographic characteristics of the responding sample are not different from those of the nonrespondents, and iii) the risk of significant distorsions introduced by the response behaviour is limited to the geographic area of residence.

Undercoverage and Nonresponse as Sources of Representativeness Bias in Non-Probability Online Panels. The Italian Case.

Miss Chiara Respi (University of Milano-Bicocca) - Presenting Author
Professor Emanuela Maria Sala (University of Milano-Bicocca)

In the last decade, non-probability online panels (NPOPs) have become popular data collection methods. However, NPOPs have a number of limitations, mainly due to the lack of representativeness of the responding sample. Research in this field has mainly focussed on nonresponse, documenting the differences in the demographic and socio-economic characteristics that arise when comparing the responding sample with the general population. There is currently very little research on the impact of other sources of error, i.e., undercoverage, on sample representativeness.

The overall aim of this paper is to investigate the impact of undercoverage and nonresponse on the representativeness of the Italian NPOP, focussing, in particular, on nonresponse occurring at the following stages of the life of the panel: the recruitment stage, the joining and profiling stage and the study specific stage. This work also aims to investigate the effectiveness of propensity score adjustment techniques in reducing the representativeness bias.

We use data from the Italian NPOP and from the Multipurpose survey. The former is a NPOP, established in 2011. The latter (considered as a “gold standard”) is a probability-based survey that collects a wealth of information on the socio-economic characteristics and opinions of the Italian population (N=45,204; AAPOR RR1: 79%). We also use a unique set of data that includes information on all registered panellists. We compute a number of data quality metrics, including the average absolute error and the percentage point error.

Key findings are that: i) the Internet population is not representative of the general population, ii) members are not representative of the Internet and the general population and iii) the responding sample is not representative of the Internet and the general population. However, after weighting, iv) the panel-survey sample is more representative of the general population than of the Internet population.