ESRA 2017 Programme

Tuesday 18th July      Wednesday 19th July      Thursday 20th July      Friday 21th July     

     ESRA Conference App

Thursday 20th July, 16:00 - 17:30 Room: Q2 AUD1 CGD

Data quality in non-probabilistic online surveys

Chair Miss Chiara Respi (University of Milano-Bicocca )
Coordinator 1Dr Emanuela Sala (University of Milano-Bicocca)

Session Details

Non-probabilistic online surveys are being increasingly used in research and policy analysis. The reasons for their recent success are mainly related to the relatively low costs of setting up, maintenance, and data collection associated to the large sample sizes available for the analysis. However, non-probabilistic online surveys pose a number of issues that mainly concern the generalization of the findings derived from the analysis of survey data. The session fosters discussion on the issues concerning the assessment of the representativeness of non-probabilistic samples and the methods to correct for non response.

Paper Details

1. Effects of sampling procedure on data quality in a web survey
Professor Ivan Rimac (University of Zagreb, Faculty of Law, Department of Social Work)
Dr Jelena Ogresta (University of Zagreb, Faculty of Law, Department of Social Work)

This paper examines how different sampling procedures affects variety of indicators of data quality in web surveys. Data were collected from 5 104 students who participated in EUROSTUDENT VI web survey in Croatia. In this study three different sampling procedures were applied: 1) random sample, 2) convenient sample and 3) opt-in sample that allowed access to web questionnaire from pop-up notice during their activity of registering for final exams. While efficiency of three sampling approaches is not comparable, due to unfavorable timing for last mentioned approach, descriptors of data quality are available for each recruited sample such as: completion rate, frequency of use of Don’t know answer in order to skip some questions, item non response and accuracy of answers that can be made on descriptors present in data frame. The results will be also discussed regarding to students’ and study characteristics.

2. Comparing opt-in panels with web surveys based on probability-samples: a consideration of costs and errors
Mr Nicolas Pekari (FORS, Swiss Centre of Expertise in the Social Sciences)
Dr Oliver Lipps (FORS, Swiss Centre of Expertise in the Social Sciences)
Professor Caroline Roberts (University of Lausanne)
Professor Georg Lutz (FORS, Swiss Centre of Expertise in the Social Sciences)

Election studies frequently incorporate panel designs to permit analyses of changing political attitudes and gather data on turnout and candidate choice. The cost of mounting these multi-wave surveys with interviewer-administered modes is becoming increasingly prohibitive and has led to a growing interest in the use of web-based data collection, which offers considerable time and cost saving potential. In this context, several national election studies have either already made a wholesale switch to web surveys, or experimented with different web-based or mixed mode solutions. Despite its overall lower per unit costs, however, using the Internet to survey general population, probability-based samples entails various challenges, which not only offset some of the financial savings, but also carry risks of increased survey error. These relate, in particular, to sampling and coverage, contact and recruitment procedures, self-administration of long and complex questionnaires, and to panel retention and attrition.

A relatively cheap and convenient solution to many of these challenges is to outsource data collection and/or sampling to providers of existing web-based access panels based on non-probability – or so-called ‘opt-in’ – samples. In most European countries today there exists a range of panel providers, varying in the types of services they offer, and the methods they use to recruit to and maintain their databases, but each offering a substantial sampling base for selecting potential respondents who have declared themselves willing to participate in online surveys. The main drawbacks are that despite the use of sophisticated non-probability selection methods and model-based weighting strategies to achieve seemingly representative samples of the population, the respondents that volunteer to participate in such panels may only scarcely represent their counterparts in the general population, and their regular participation in panel surveys may lead them to respond differently to less experienced respondents.

In order to fully appreciate the cost-saving advantages of different web-based survey designs, it is important to know how they compare in terms of the quality of the resulting data. In this study, we compare estimates from the 2015 Swiss Electoral Studies (Selects), which involved a mixed mode web/telephone post-election survey and a four-wave panel survey, both based on a random probability sample of Swiss voters, with estimates based on samples drawn from three different opt-in panels. We use auxiliary data from the register-based sampling frame used for Selects as a benchmark to assess the representativeness of the responding samples across the surveys. We then compare point estimates on key variables, and investigate the extent to which any differences between studies affect the overall relation between variables in widely-replicated regression analyses of voting behaviour, to draw conclusions about the extent to which sampling design matters for typical analysts of electoral study data. We supplement these analyses with a calculation of the unit costs associated with each survey design to inform future decisions about the cost-error trade-offs involved in designing probability-based web surveys of general population samples.

3. Comparing the findings from probability surveys with non-probability online panels in an Australian research context
Mr Darren Pennay (Social Research Centre, Australian National University)
Dr Dina Neiger (Social Research Centre, Australian National University)
Dr Paul Lavrakas (Social Research Centre, Australian National University)

In Australia in 2014-15, 86 per cent of households had the internet connected (ABS Cat.8146.0). Accompanying this growth in internet connectivity there has been an increase in the volume of survey research undertaken via the worldwide web. Since 2010, online research has been the dominant mode of data collection in the Australian market and social research industry, supplanting Computer Assisted Telephone Interviewing (CATI).

In Australia in 2015 online research accounted for 41 per cent of the revenue generated by the market and social research industry up from 31 per cent two years earlier (Research Industry Council of Australia (2016)). Worldwide the increase in internet penetration has seen a plethora of non-probability internet panels established. These panels provide researchers with access to panel members prepared to undertake surveys for ‘rewards’. In United States and parts of Europe the increased use of the web for data collection also resulted in establishment of probability-based online research panels to enable the scientific sampling of the population. The same is not true in Australia where there, until now, there have been no commercially available national probability based online panels.

The authors of this paper are concerned that the rapid increase in the use of non-probability online panels in Australia has not been accompanied by an informed debate regarding the advantages and disadvantages of probability and non-probability surveys.

The 2015-2016 Australian Online Panels Benchmarking Study was undertaking to inform this debate. This paper reports on the findings from a single national survey administered across three different probability samples and five different non-probability online panels. Subsequently, the survey was also administered on Australia’s first probability-based online panel – the Life in Australia panel.

This study enables us to determine, whether or not in an Australian context, that surveys using probability-sampling methods produce more accurate results, relative to independent population benchmarks, than surveys relying upon non-probability sampling methods. In doing so we hope to build on similar international research in this area (E.g. Yeager et al. 2011, Chang & Krosnick 2009, Walker, Pettit & Rubinson, 2009).

4. Assessing the Representativeness of Nonprobability Online Panels. The Italian Case.
Miss Chiara Respi (University of Milano-Bicocca)
Professor Emanuela Sala (University of Milano-Bicocca)
Mr Angelo Tomaselli (Demetra s.r.l.)

Online panels are increasingly used in social research. The advantages of online panels are undisputed (i.e., fast data collection, lower costs). However, online panels – and in particular the nonprobability ones – have a number of limitations, due to coverage error and self-selection into the panel. The key issues is that online panels may not be representative of the population that they intend to represent and this may have important repercussions on the quality of the estimates produced (AAPOR, 2010; Callegaro et al., 2014). Despite the relevance of these issues, there are few studies in this field (see below); these studies have found strong evidence for bias and have shown that adjustments strategies may not be effective. The overall aim of the work is to assess the representativeness of an Italian nonprobability online panel and investigate the effectiveness of different post-survey adjustment strategies (i.e., post-stratification, propensity score adjustments) in reducing the bias.

We compare the demographic and socio-economic composition of the panel and estimates from a set of selected survey items (e. g., consumption) to those of the general population computing percentage differences and appropriate statistical tests. When appropriate, we use regression analysis. We then compare estimates from the online panel and the “gold standard” data, obtained by implementing different post-survey adjustment strategies. We use data from the Italian online panel, and from the Multiscopo survey. The former is a non-probability online panel, established in 2011 (see also The latter (considered as a “gold standard”) is a probability based survey that collects a wealth of information on the socio-economic characteristics and opinions of the Italian population (N=44.974; RR: 78.9%).

Preliminary analysis shows evidence for sample selection bias; compared to the general population, the panellists are more educated, are more likely to be young adults, and employed. We found no major differences in area of residence and sex.