ESRA logo

ESRA 2023 Glance Program

All time references are in CEST

Approximating Probability Samples in the Absence of Sampling Frames 2

Session Organisers Dr Carina Cornesse (German Institute for Economic Research)
Dr Mariel McKone Leonard (DeZIM Institute)
TimeWednesday 19 July, 16:00 - 17:30
Room U6-07

Research shows that survey samples should be constructed using probability sampling approaches to allow valid inference to the intended target population. However, for many populations of interest high-quality probability sampling frames do not exist. This is particularly true for marginalized and hidden populations, including ethnic, religious, and sexual minorities. In the absence of sampling frames, researchers are faced with the choice to discard their research questions or to try to draw inferences from nonprobability and other less conventional samples.

For the latter, both model-based and design-based solutions have been proposed in recent years. This session focuses on data collection techniques designed to result in samples that approximate probability samples. We also invite proposals on techniques for approximating probability samples using already collected nonprobability sample data as well as by combining probability and nonprobability sample data for drawing inferences. The session scope covers but is not limited to research on hard-to-reach and hard-to-survey populations. We are particularly interested in methodological research on techniques such as

- Respondent-driven sampling (RDS) & other network sampling techniques
- Quasi-experimental research designs
- Weighting approaches for nonprobability data (especially those that make use of probability sample reference survey data)
- Techniques for combining probability and nonprobability samples (e.g. blended calibration)

Keywords: nonprobability sample, respondent-driven sampling, blended calibration, weighting, data integreation


Bias reduction and bias correction for a non-probability panel during the pandemic 

Ms Eszter Sandor (Eurofound) - Presenting Author
Ms Daphne Ahrendt (Eurofound)
Mr Massimiliano Mascherini (Eurofound)
Mr Michele Consolini (Eurofound)

The Living, Working and COVID-19 (LWC) online survey series was launched in spring 2020 in 27 countries, using non-probability sampling methods, primarily social media advertising. The survey included a panel element, with emails collected in four of the five survey rounds from respondents willing to participate in future survey rounds. One of the five rounds was a panel-only survey round. Each wave the social media recruitment was refined, while the external circumstances of the pandemic changed, impacting on sample performance. This paper reviews the panel element of the survey, including how panel respondents differed from other survey participants and from the reference population, and how the sample composition changed throughout the five survey rounds. The paper subsequently discusses the approaches to weighting – calibration weighting and propensity score weighting – that were explored to improve the reliability of the panel-based estimates. 

Fielding the same questionnaire simultaneously in probability and non-probability online surveys: exploring opportunities for bias correction

Dr Gijs van Houten (Eurofound) - Presenting Author
Ms Eszter Sandor (Eurofound)
Ms Daphne Ahrendt (Eurofound)

In spring 2020, Eurofound launched the Living, working and COVID-19 (LWC) e-survey, a non-probability online survey of the general population in the European Union, to gauge the impact of the coronavirus pandemic on the living and working conditions of Europeans. Five rounds were fielded between spring 2020 and spring 2022 in the 27 EU Member States, with more than 150,000 full observations recorded during this time. The LWC e-survey uses convenience sampling through advertisements on social media. Eurofound is exploring how to best add the LWC e-survey to its tool kit. To assess the impact of the sampling approach on the survey results, we will simultaneously run the same questionnaire on our e-survey and on probability-based online panels in at least four countries in April 2023. This will not only show to what extent estimates differ but will also allow us to develop an approach for anchoring the non-probability survey estimates on the results from Eurofound’s probability surveys in the future. This paper discusses the differences in the substantive results between these two different sampling approaches, as well as differences with reference data from statistical sources and other high-quality surveys. It aims to identify substantive areas where the sampling approach is particularly (un)problematic: identifying potential niches for which the LWC e-survey is more suited as well as substantive domains for which it might need to be avoided. It also aims to identify variables that are particularly effective when it comes to developing weighting algorithms that correct for biases following from the sampling approach, with a view of identifying a key set of variables that could be collected across our probability and non-probability surveys, potentially broadening the applicability of the latter.

Using Hybrids to Analyze Subgroups: Can Blended Calibration Be Equally Effective for All?

Mr Michael Jackson (SSRS) - Presenting Author
Ms Cameron McPhee (SSRS)

In recent years, hybrid samples that blend data from probability and nonprobability sources have become increasingly popular. These designs use the internal probability sample to enable weighting on non-demographic measures that are correlated with selection into nonprobability samples but for which external weighting targets do not exist. Hybrids thereby attempt to leverage the low cost of nonprobability samples while controlling the selection biases that are known to be present in such samples.

A growing body of research has explored methods of weighting hybrid samples, including propensity models, calibration weighting, and combinations of these. Much of this research has focused on topline estimates—that is, whether the weighting procedure reduces selection bias in estimates over the full population of interest.

However, a common use case for hybrids is to facilitate analyses of smaller (and sometimes harder-to-reach) subgroups, for which obtaining sufficient sample sizes from probability sources alone might be cost-prohibitive. The need for valid subgroup estimates increases the challenge of choosing a single optimal weighting model for a hybrid sample, since patterns of selection bias in nonprobability samples can vary across subgroups. This is particularly true when estimates are needed for overlapping subgroups.

Therefore, this presentation will explore methods by which standard hybrid calibration procedures can be enhanced to improve the robustness of estimates across multiple overlapping subgroups. In particular, we will assess conditions under which assigning “pseudo base weights” to the nonprobability sample, via a random forest propensity adjustment, can yield more consistent bias reduction across subgroups than calibration-only adjustments. We will also test whether post-processing of the propensity scores using the recently proposed “universal adaptability” methodology (Kim et al. 2022) yields further benefits for equalizing bias reduction across subgroups. Tradeoffs in terms of design effects and level of effort will also be addressed.

Weighting survey data containing probability-based and nonprobability-based samples: An exploratory study

Dr Zhigang Wang (Department of National Defence ) - Presenting Author

While probability sampling is more time-consuming and costly than nonprobability sampling, probability sampling is usually used for a sample survey because it produces reliable population estimates. However, probability sampling may not be optimal for a new, time-sensitive sample survey when the population has recently been heavily surveyed (or is being surveyed) and the population size is limited. In this situation, non-probability sampling can be used as a supplementary sampling strategy. When survey data include a nonprobability-based sample, weighting becomes challenging. This study explores weighting strategies when nonprobability samples are used in survey projects.

A pulse survey on food accommodation was sent to a stratified random sample of Canadian Armed Forces (CAF) members, but the response rate was very low. Because the CAF population had been heavily surveyed, it was decided not to pursue additional probability samples. Instead, the survey was opened to all other CAF members, yielding an additional nonprobability-based sample. An exploratory study was conducted to determine a better weighting strategy among possible weighting strategies identified in the literature for survey data, a strategy that comprised probability-based and nonprobability-based samples. This paper discusses the limitations of the selected weighting strategy for the combined survey data and provides recommendations on the sampling design and weighting methods for similar future studies.