ESRA 2019 Programme at a Glance

Representation Error: Linking Sampling Design and Fieldwork Practices 2

Session Organisers Dr Kathrin Thomas (Princeton University)
Dr Salima Douhou (City, Univeristy of London)
TimeFriday 19th July, 11:00 - 12:30
Room D20

This section focuses on the representation side of the Total Survey Error framework linking fieldwork practices to probability sampling designs. Previous research has investigated the impact of sampling error on the “representativeness” of a survey’s target population. While confidence in good coverage of the target population is crucial, correct random sampling is also a highly dependent on the survey interviewers and their supervisors, as they select the dwellings, households, and respondents in the field. Depending on the cultural context, the availability of solid sampling frames, field organisations’ general guidelines, and other aspects, fieldwork practice may largely influence a survey’s representativeness.
We invite papers studying the extent to which fieldwork practices and sampling designs affect representation error using existing survey data. We are also interested in papers discussing coverage issues of (special) populations and in approaches mitigating these during fieldwork, including applied solutions to monitor, reduce, and tackle fieldwork influences on sampling designs. In addition, we welcome applied and novel approaches to estimate the impact of fieldwork practices on random sampling, instruments that may circumvent this problem, such as further automated respondent selection or random walks with geo-fencing, and other techniques to control random selection at all stages. Finally, we aim to attract papers addressing how fieldwork practices affect random sampling more generally.

Potential topics could include, but are not limited to the following:

• How can we best deal with the lack of sampling frames?
• Under what circumstances should special populations or subgroups of a population (i.e. internally displaced persons) be excluded? When do we consider under-coverage?
• What can we do, when the best available sampling frame is not good enough? Is it appropriate to apply additional methods to ensure good coverage? How would these look like?
• What fieldwork practices are harmful to the quality of the survey sample? How can we monitor these?
• Are there ways to implement a random walk in a way that sampling error is minimised?
• How can we improve methods at the doorstep to reduce sampling error, i.e., tackle interviewers or respondents (self-)selection?
• How is the data quality affected by different fieldwork practices in comparative or longitudinal studies? How much could/should we harmonise?
• How do house changes affect fieldwork?

We welcome contributions from various cultural contexts, survey practitioners, secondary survey data users, and academic researchers.

Keywords: sampling, fieldwork practice, representativeness, special populations, random walk

Geo-Sampling: The Way Forwards to Generate High Quality Probabilistic Samples in Countries with Limited Population Data?

Ms Alexandra Cronberg (Kantar Public) - Presenting Author
Mr Jamie Burnett (Kantar Public)

Selecting random probability samples of households can be challenging in countries with no sample frames, nor recent census data. In order to achieve high quality probabilistic samples despite such constraints, Kantar Public has developed an innovative approach to construct sample frames, validate addresses, and select households, using GIS sampling, WorldPop data, and Bauer’s True Random Walk. We recently tested this sampling approach in four states in northern Nigeria.

Specifically, Kantar Public’s GIS mapping team used freely available modelled population data from WorldPop, which uses census data, satellite imagery of settlements, road networks, and light population, among other variables, to build an areal sample frame comprising 1km2 grid cells with attached population data. Higher administrative units were added to the sample frame to allow stratification. This approach has several benefits:

1. Estimated population of each grid cell can be used when selecting the sample
2. The Kantar Public in-house GIS sampling platform can be used to select a random starting address from within each grid, using reverse geo-coding API
3. The selected grids and starting addresses can be visually validated through the GIS sampling platform
4. Interviewers can be issued with detailed maps showing starting address and boundary of the 1km2 grid cell, as well as larger maps to help them locate the grid, to support fieldwork

Bauer’s True Random Walk is an alternative to household selection through conventional random walk practices employed by most survey agencies, which all lead to unequal household selection probabilities. The alternative tested in Nigeria draws on Bauer’s paper ‘New Sample Designs: An Improvement and Alternative to Random Route Samples’ (2017) where a directional grid is used with randomly generated instructions for each junction.

This approach worked well in the four Nigerian states covered by the study and promises a high-quality alternative to other sampling approaches.

Geo-Sampling Versus Random Walk: Who is Represented?

Dr Safaa Amer (RTI International) - Presenting Author
Ms Jennifer Unangst (RTI International)
Dr Karol Krotki (RTI International)

Due to lack of frame,limited time, and aiming at reducing bias, Geo-sampling was identified as the sampling strategy for data collection from women with children 0 to 23 years old in two states of Nigeria under the Alive and Thrive study. However, when faced with lower than expected prevalence of the target population and the need to supplement the sample, we have integrated an experiment aiming at achieving the target sample size by conducting random walks in the targeted states where geo-sampling had not taken place,
This paper presents a side by side comparison of Geo-sampling and Random walk strategies in case of frameless target population, The aim is to assess the pros and cons of each of the two methodologies in the same environment (target population, spatial distribution survey, and parallel data collection). Results highlight the differences in population demographics and characteristics under the two methodologies, the bias observed, and challenges faced across the two methodologies as well as recommendations

Surveying European Cities: The Use of Landline RDD Sample and a List-Assisted Mobile Sample, Enriched with Geographic Information

Ms Sara Gysen (Ipsos Belgium) - Presenting Author
Mr Carsten Broich (Sample Solutions)

In North America mobile phone numbers are linked to specific area codes which allows a rough estimation of a mobile phone by area. Thus the existence of area codes for mobile phone ranges allows small area sampling by means of mobile and landline RDD sample (even though there is some limitation due to number portability).

For the case of Europe however, mobile phone numbers are not linked to areas which makes it difficult to sample smaller areas. Hence, in most cases only the option of RDD landline sample can be provided. Using a single landline frame however will lead to a sample that over-represents specific socio-demographics such as female respondents in higher age group. Income level and education level also varies in comparison to a cell phone RDD sample.

For this 84-city research project in Europe a dual frame approach is used which targets cities by landline RDD sample and a list-assisted mobile sample, enriched with geographic information, which is generated from various publicly available sources.

Upon fieldwork completion, the demographics of the two frames are compared and additionally, the listed mobile sample is compared with comparison studies in which a country wide RDD sample is used in which specific cities have an adequately large sample size.

Finally conclusions are drawn about the advantages and disadvantages of this method, its accuracy, and a follow-up agenda for future studies is created.

Controlling Unit-Nonresponse Bias During Within-Household Selection with Optimal Allocation and New Specification of the Kish Grid

Ms Blanka Szeitl (Department of Stochastics, Bolyai Institute, Faculty of Science, University of Szeged) - Presenting Author
Dr Tamás Rudas (Department of Stochastics, Bolyai Institute, Faculty of Science, University of Szeged)

Several techniques exist to measure and adjust for non-response bias such as propensity models, or post-stratification.
All of them can be applied only after the data collection, and require reliable data for the entire population regarding unit non-response patterns however currently only estimates are available.
In this paper, we demonstrate a new procedure controlling unit non-response during the sampling stage, preceding the actual data collection by combining classical techniques as Neyman's optimal allocation and the Kish grid.
The main finding is that the new sampling algorithm leads to lower SE then the usually applied post stratification.

Detecting Interviewer Incurred Representation Error

Dr Kathrin Thoms (Princeton University) - Presenting Author
Dr Michael Robbins (Princeton University; University of Michigan)

Researchers typically associate representation error as outlined in the Total Survey Error framework with sampling error occurring in the selection process prior to fieldwork. Especially, in contexts where survey methodology has no long tradition and we generally lack hard sampling frames, survey interviewers may play a crucial role in the selection process during the field when randomly selecting households and respondents within the household at the doorstep. Whether or not deliberately, error can occur if the interviewer does not follow the general rules of, for example, random walks, household skip patterns, and respondent selection. Using data from the Arab Barometer Wave 5, we explore how to empirically detect potential divergences from the sampling script in the field looking at administrative variables collected by the CAPI software, such as time markers, geo locations, contact data. The rich cross-national data set including 12 countries in the MENA region allows us to track within and across variation in these markers. Given these para data allow us to effectively detect divergences from the sampling protocol, our work may open up new pathways to correct for representation error incurred by interviewers during fieldwork.