ESRA logo

ESRA 2023 Glance Program

All time references are in CEST

Quantitative and qualitative methods to survey hard-to-reach populations 3

Session Organisers Dr Alessandra Gaia (University of Milano-Bicocca)
Dr Daniele Zaccaria (University of Applied Sciences and Arts of Southern Switzerland (SUPSI))
TimeThursday 20 July, 14:00 - 15:30
Room U6-03

Survey researchers often face the challenge to collect data on so called “hard-to-reach population” or “hard-to-survey” populations. These terms refer to population subgroups that are rare, marginal, hidden, elusive or excluded from mainstream society and thus hard-to-locate, sample, contact or interview. Examples include sex workers, illegal immigrants, victims of trafficking, drug users, displaced populations, homeless, institutionalised people, but also groups that – while not being excluded or marginalised – are rare and elusive (e.g. elites), hard-to-persuade to take part in surveys or hard-to-interview (for example due to lower cognitive abilities, like the oldest-old).

Exclusion of hard-to-reach population subgroups from data collection may lead to biased estimates on topics of relevance for social science research (e.g. poverty, inequalities, physical and mental health, social care, housing, migration, wellbeing, and social exclusion); ultimately, lack of information on hard-to-reach population subgroups lead to policy agendas which may not take into account the needs of the most vulnerable in society, and exacerbate social inequality and conflict.

However, meeting the need for high quality data is complex. To overcome this challenge, a number of quantitative and qualitative research methods have been developed, including: techniques to estimate the size of hard-to-reach populations (e.g. capture-recapture), sampling strategies (e.g. Respondent Driven Sampling), and data collection methods to ask questions about sensitive topics, including indirect questioning techniques (e.g. the Item Count Technique), adoption of proxy respondents, passive data collection through new technologies, participatory mapping, visual methods, etc.

We welcome submission on empirical or theoretical comparison of different research techniques, theoretical discussion of challenges faced by social researchers in surveying hard-to-reach populations and elaborations on the ethical principles guiding research on these population subgroups.

Keywords: Hard-to-reach populations, hard-to-survey populations, indirect questionning techniques, Respondent Driven Sampling, Passive data collection


How to Survey Sensitive Topics in Hard-To-Reach Populations – An RDS-based factorial survey with rejected asylum seekers

Dr Laura Peitz (BAMF Research Centre) - Presenting Author
Mr Randy Stache (BAMF Research Centre)
Dr Lisa Johnson (BAMF Research Centre)

Throughout the past years, the number of rejected asylum seekers has been continuously high across Europe, as many of them stay for lengthy periods, despite their legal obligation to leave. For several reasons, little is known about this refugee group, aggravating informed policy-making: Rejected asylum seekers, a spatially mobile and dispersed group, are particularly hard-to-reach. Given their high linguistical and sociocultural diversity as well as limited survey participation experience, they are hard-to-interview. The group is hard-to-persuade of cooperating with strangers or disclosing personal information, especially on sensitive topics like im-/mobility decision-making between staying, returning and onward migration. Given our institutional affiliation at the Research Centre of the German Federal Office for Migration and Refugees, field access to rejected asylum seekers is especially challenging, requiring particular ethical reflection.

To overcome these difficulties, we propose an innovative mixed-methods research design, including qualitative fieldwork, an App-based respondent-driven sampling (RDS), and a factorial survey, to study individual im-/mobility aspirations of West African rejected asylum seekers. Using ethnographic methods, we identify, recruit, and build trust with initial respondents. These seeds then distribute a research App into their peer networks. Consecutive recommendations and double incentivization lead to further respondents. The App includes a self-administered survey featuring a factorial survey, to analyse the determinants of im-/mobility aspirations based on respondents’ assessments of hypothetical individuals’ living conditions.

This paper discusses the challenges of collecting high-quality data on rejected asylum seekers, and outlines potential solutions in the form of a multi-method App-based RDS featuring a factorial survey. By underpinning this with lessons from the field and the ethical considerations guiding our research, we provide practical details on a promising research approach in reaching hard-to-reach populations.

singleRcapture: an R package for estimation of hidden populations using single-source capture recapture models

Mr Piotr Chlebicki (Adam Mickiewicz University, Poznań) - Presenting Author
Dr Maciej Beręsewicz (Poznań University of Economics and Business / Statistical Office in Poznań)

Population size estimation is an important issue in official statistics, social sciences and natural sciences. One way to tackle this problem is by applying capture-recapture methods, which can be classified depending on the number of sources used, i.e. one source or two and more sources.

In our presentation we focus on the first group of methods, i.e. single-source capture-recapture (SSCR). SSCR models assume that observed counts follow truncated count distributions (e.g. zero-truncated Poisson, one-inflated zero-truncated geometric) and this assumption is used to estimate missing (hidden) zero counts. The literature includes applications of SSCR methods for estimating the number of irregular migrants, home violence cases or homeless people.

In the presentation we introduce the singleRcapture R package for estimating SSCR models. The package implements state-of-the-art models as well as some new models proposed by the authors (e.g. extensions of zero-truncated one-inflated and one-inflated zero-truncated models). The software is prepared for users interested in estimating the size of populations, particularly those that are hard-to-reach or for which information is only available from one source and dual/multiple system estimation cannot be utilized.

At the time of writing the abstract, the package is only available to install via a github repository ( However, to the conference it will be uploaded to CRAN.

Using a Records-Survey Linkage to Identify Households At Risk of Under-Reporting Young Children

Ms Joanne Pascale (US Census Bureau) - Presenting Author
Mr Matthew Virgile (US Census Bureau)
Mr Kevin Shaw (US Census Bureau)
Mr John Boies (US Census Bureau)

In 2019, researchers conducted a pilot study exploring the long-standing issue of the undercount of young children in the U.S. Census and surveys. The goal was to identify households at risk of omitting young children from the household roster in order to conduct qualitative interviews with these households about features of data collection that contributed to the erroneous omissions, such as question wording, instructions and layout. The pilot used conventional methods to identify and recruit at-risk households, with disappointing results: given level of effort, we simply did not recruit the quantity of cases that would be needed for main stage data collection. We then turned to a more promising method to identify the target sample. An amalgamation of administrative records (adrecs) was matched to American Community Survey (ACS) responses. Households where adrecs indicated at least one young child not listed in the ACS were flagged as “at risk.” While the pandemic delayed the field work with this sample, the hiatus provided an opportunity to carefully examine this novel approach to selecting an elusive sample. In this paper we document the sampling process – the specifications for defining the profile of at-risk households, given the variables available in the ACS and adrecs. Next, we characterize the results of the matched sample by quantifying the proportion (1) dropped due to matching or data anomalies; (2) categorized into the various subgroups of at-risk households (e.g., grandparents with young children and no parent; 'nuclear families' with subfamilies living in the household; young single parents); (3) found in adrecs but not in ACS, by various age categories; and (4) reported in the ACS but not found in the adrecs. Finally, we revisit the sample selection and matching specifications to identify any refinements and to consider applications for other rare populations.

What is important to RDS data collection productivity?

Dr Ai Rene Ong (American Institutes for Research) - Presenting Author
Professor Michael Elliott (University of Michigan)
Professor Sunghee Lee (University of Michigan)

Although RDS is a popular method of sampling hidden populations, slow or stopped recruitment have been reported by researchers using it. This has resulted in ad-hoc changes to the data collection protocols, such as adding seeds or increasing the number of recruits. The lack of clear reporting of RDS methodology in published papers creates a challenge in understanding what helps or hinders data collection efficiency. This study used data from a survey of RDS researchers to overcome this challenge. A hundred and twenty one RDS researchers responded to this survey, which asked questions about their RDS research (e.g., the target population, use of formative research, incentive amount, sample size, number of seeds, mode of administration). These study characteristics were examined for their associations with productivity in data collection, defined as overall productivity (the ratio of an achieved sample size to the target sample size) and overall seed productivity (the ratio of an achieved sample size to the final seed size. Of the examined RDS study characteristics, the target population of the RDS research and formative research and the mode of administration (web vs. non-web) are associated with overall productivity. The target population of the research and formative research was also associated with overall seed productivity. The study location was also associated with seed productivity, with studies done in the U.S. being less productive than those done outside of the U.S.