ESRA 2019 Programme at a Glance

Sampling Hard-to-Reach Populations 2

Session Organisers Professor Ulrich Kohler (University of Potsdam)
Professor Lena Hipp (WZB Berlin Social Science Center/University of Potsdam)
Mr Dimitri Prandner (Johannes Kepler University of Linz, Austria)
Professor Martin Weichbold (University of Salzburg, Austria)
TimeFriday 19th July, 11:00 - 12:30
Room D09

Public interest in learning more about demographic groups that are either small, hidden, mobile, or engaged in illicit behaviors has grown in recent years. Prominent examples of these “hard-to-reach” populations are drug addicts, homeless people, or prostitutes as well as very rich people and migrants, particularly those who are not documented or who travel a lot. None of these demographic groups can be adequately sampled with probability surveys, either because of the absence of sampling frames, the small size of these groups compared to the total population, their unstable residency, or their reluctance to participate.

In order to learn more about these hard-to-reach populations, researchers have employed a few different sampling methods, including network sampling, link-tracing designs (aka snowball sampling), and respondent-driven sampling. Although the use of such nonprobability sampling methods to survey hard-to-reach groups has rapidly expanded and has been employed in many different contexts (though particularly in developing countries), numerous questions regarding both the implementation of these surveys and the analyses of the collected data have not yet been (fully) resolved.

- How can the target population be adequately delineated and identified in the sampling process?
- How should researchers choose incentives and interview locations when surveying hard-to-reach populations? What are best practices in seed selection?
- What do we know about mode differences when surveying hard-to-reach populations and asking individuals about illicit behaviors?
- What challenges occur when employing nonprobability sampling in comparative studies, for example with regard to the number of initial seeds and assumptions regarding the referral
- What are the best estimators when analyzing data collected from non-probability samples?
- How can we best calculate the variability of the estimates from non-probability samples?
- What are the ethical issues when surveying hard-to-reach populations and how can they be resolved in an acceptable way for researchers, respondents, and funding agencies?
- How can nonprobability surveys be combined with other methodological approaches to assess the accuracy of their findings?

Keywords: hard-to-reach/hidden populations, illicit/stigmatized behaviors, nonprobability sampling

Estimating Regression Models Using Respondent Driven Sampling Data

Dr Ismael Sánchez-Borrego (Department of Statistics and O.R. University of Granada) - Presenting Author
Dr María del Mar Rueda (Department of Statistics and O.R. University of Granada)
Dr Sunghee Lee (Institute for Social Research. University of Michigan)

Respondent driven sampling(RDS) is a chain-referral sampling method for surveys of hard-to-reach, stigmatized and/or illusive human populations, such as injection drug users, commercial sex workers, migrants and LGBTI communities. RDS has proven practical in many challenging settings and has been widely adopted by a large number of public health organizations around the world.The main advantage of RDS is its operational feasibility: it does not require an ordinary sampling frame;it eliminates screening processes as respondents recruit other eligible persons; and therefore, it reduces resource burdens substantially, compared to traditional sampling.

Reflecting the widespread use and growing popularity of RDS, the literature on statistical inferences on univariate statistics is rapidly increasing. However, there has been little work done on inferences for multivariate associations. In particular, the chain-referral nature of RDS may not guarantee the RDS observations are independent and identically distributed random variables, making inferences on associations challenging.We propose a method for estimating the regression coefficients using RDS. The method involves the estimation of variances and covariances of continuous data. Simulation experiments assess the practical performance of the proposed method under different scenarios. We will also include applications to real RDS data for rare populations for whom external benchmark data are available.

Utility of Paradata in Respondent Driven Sampling

Dr Sunghee Lee (University of Michigan) - Presenting Author
Ms Ai Rene Ong (University of Michigan)

Respondent driven sampling (RDS) has been proposed to as an alternative sampling approach for hard-to-sample groups by attempting to exploit people’s tendency to form ties with alike others (i.e., social networks) for participant recruitment. In RDS, sampling is controlled by respondents themselves, and little to no information is available to ascertain the sampling mechanism. Further, the premise of RDS implementation success is peer referral that participants recruit their peers and the recruited peers participate. If participants and their peers do not cooperate with the recruitment (or participation) request, the difficulty to understand the sampling process becomes exacerbated. More importantly, with recruitment noncooperation, RDS recruitment can end abruptly, leading to unstable and unpredictable sample sizes causing operational difficulties; and the Markovian recruitment assumption, a critical element of RDS inferences, is voided.

This study examines the utility of paradata for understanding and predicting recruitment noncooperation in RDS in order to improve chances of successfully implementing RDS. Specifically, we examine recruitment noncooperation in 1) an in-person RDS survey of illicit substance users with interviewer observation data; and 2) a Web survey of immigrants with response latency, item nonresponse and interview device.

Our Health Counts Toronto: Lessons Learned from Implementing a Multi-Site Respondent-Driven Sampling Framework in an Urban Centre

Ms Kristen O'Brien (St. Michael's Hospital) - Presenting Author
Ms Chloe Xavier (St. Michael's Hospital )
Dr Raglan Maddox (St. Michael's Hospital)
Dr Michael Rotondi (York University)
Ms Cheryllee Bourgeois (Seventh Generation Midwives Toronto)
Ms Sara Wolfe (Seventh Generation Midwives Toronto)
Dr Janet Smylie (St. Michael's Hospital)

Indigenous peoples comprise 5% of the Canadian population. As with most Canadians, the majority of Indigenous peoples now live in cities. Evidence suggests significant inequities in health determinants and health status for Indigenous peoples in urban areas. Predominantly due to colonization, discrimination, and historical mistrust of government many Indigenous people have been deterred from participating in national level surveys such as the Canadian census. The lack of accurate and sufficient data to address these inequities undermines planning, development and implementation of programs and policies. The Our Health Counts (OHC) Toronto project was developed to improve health and wellbeing data for Indigenous people living in urban centres.

Present lessons-learned from data collection and cleaning of a multi-site Respondent-Driven Sampling (RDS) study in an urban centre.

The OHC survey was administered in Toronto between April 2015 and March 2016. The survey contained questions on topics such as health status and access to healthcare services. Participants were recruited based social networks using RDS, and building on Indigneous kinship lines. Participants who successfully completed the survey were provided 3-5 coupons to distribute to recruit new participants from their social networks. To ensure that participants could access the survey, three survey locations and home visits were implemented.

Strong Indigenous community leadership and involvement generated a final cohort of 917 Indigenous adult surveys and 234 Indigenous child surveys. Increasing the number of locations was beneficial to successful participation rates, but also provided data collection and data quality challenges. For example, participants completing the survey more than once substantially impacts recruitment trees. The following questions will be examined: “What impact does removing duplicate observations from RDS chains have on population estimates?” and “How can data quality processes be implemented to reduce survey collection error caused by a multi-site RDS framework.”

Relationship between Recruitment Homophily and Variance Estimation for Respondent Driven Sampling Data

Ms Ai Rene Ong (University of Michigan) - Presenting Author
Dr Sunghee Lee (University of Michigan)
Professor Michael Elliott (University of Michigan)
Mr Chen Chen (University of Michigan)

Respondent Driven Sampling (RDS) has been increasingly used as a sampling method for hard-to-reach or hidden populations such as sex workers and injection drug users. In RDS, initial respondents are asked to recruit a limited number of people in their network, and their recruits will continue this process until target sample size is achieved. However, as the recruitment is in the hands of the respondents, there can be a very high sampling variability as the selection mechanism is largely unknown. Recruitment homophily, i.e., the tendency for recruiters to recruit respondents who are similar to themselves, may result in a biased point estimate and has implications for sampling variance. The commonly used variance estimator, i.e., bootstrap method by Salganik, may not have sufficient coverage if recruitment homophily is high, as it assumes First Order Markov process on the inference variable. In this study, we propose a new variance estimation method to account for homophily which bootstraps recruitment chains. We conduct a simulation study where we can manipulate recruitment homophily, as accurate/complete network information and recruitment behaviour in RDS is usually unknown. This will enable us to compare the performance of the current and proposed RDS inference methods under various degrees of recruitment homophily.