ESRA 2017 Programme

Tuesday 18th July      Wednesday 19th July      Thursday 20th July      Friday 21th July     

     ESRA Conference App


Wednesday 19th July, 11:00 - 12:30 Room: F2 102

Administrative Records for Survey Methodology 2

Chair Dr Asaph Young Chun (US Census Bureau )
Coordinator 1Professor Mike Larsen (George Washington University)
Coordinator 2Dr Ingegerd Jansson (Statistics Sweden)
Coordinator 3Dr Manfred Antoni ( Institute for Employment Research)
Coordinator 4Dr Daniel Fuss (Leibniz Institute for Educational Trajectories)
Coordinator 5Dr Corinna Kleinert (Leibniz Institute for Educational Trajectories)

Session Details

Incorporation of administrative records have long been regarded as a way of improving the quality and interpretability of surveys and censuses and of controlling the rising cost of surveys (Chun and Scheuren, 2011). The increasing number of linked datasets, such as Health and Retirement Study in the U.S., National Educational Panel Study in Germany, and Understanding Society in UK, are accompanied by growing empirical evidence about the selectivity of linked observations. The extent and pace of using administrative data varies from continent to continent and from country to country. This is partly due to differential concerns about privacy, confidentiality, and legal constraints, as well as variability in acceptance and implementation of advances in statistical techniques to control such concerns.

The primary goal is to control data quality and reduce total survey error. This session will feature papers that implement "total administrative records error" and “total linked data error” methods and provide case studies and best practices of using administrative data tied to the survey life cycle (Chun and Larsen, a forthcoming Wiley book). The session invites papers that discuss fundamental challenges and recent advancements involved in the collection and analysis of administrative records, integration with surveys, censuses, and auxiliary data. We also encourage submission of papers discussing institutional collaboration on linked data, sustainable data access, provision of auxiliary tools and user support. For example, papers in this session include, but are not limited to the following topics:


1.Innovative use of administrative data in household surveys and censuses to improve the survey frame, reduce nonresponse follow-up, and assess coverage error.

2.Quality evaluations of administrative data and quality metrics for linked data

3.Recent advancements in processing and linking administrative data with survey data (one-to-one) and with multiple sources of data (one-to-many).

4.Recent methods of disclosure limitation and confidentiality protection in linked data, including linkages with geographical information.

5.Bayesian approaches to using administrative data in surveys, censuses, small area estimation, and nonresponse control.

6.Implementation of new tools that facilitate the use of linked data by simplifying complex data structures or handling inconsistent information in life-course data

7.Strategies for developing and maintaining a user-friendly infrastructure for the analysis and dissemination of linked data and solutions for collaboration

8.Applications that transform administrative data into information that is useful and relevant to policymaking in public health, economics, science and education.

Paper Details

1. Measuring and Controlling for Non-Consent Bias in Linked Survey and Administrative Data
Dr Joseph Sakshaug (University of Manchester)

Numerous large-scale surveys conducted around the world supplement their primary data collections with linkages to a variety of administrative sources (e.g. social security records). However, due to the highly sensitive and confidential nature of administrative records, accessing and linking such records to surveys requires agreement from multiple parties, including the administrative data owners, key stakeholders, and in many cases, the survey respondents themselves. In fact, obtaining informed consent from respondents prior to linking their survey and administrative records is often mandated by research ethics boards and/or legal regulations. Not all respondents consent to linkage and some evidence suggests that the proportion of linkage non-consenters is growing over time. Linkage non-consent is problematic in terms of reducing statistical power and possibly introducing bias in linked-data estimation. Several studies have shown that survey and administrative variables are affected by linkage consent bias. Different methods have been used to measure linkage consent bias and different strategies have been proposed for minimizing this source of bias either at the survey design stage or post-data collection. In this presentation, I review these different methods and strategies for measuring and controlling for consent bias in linked data sources. In doing so, I note the strengths and limitations of these approaches and conclude by providing practical guidance to researchers interested in addressing this source of error in their own studies.


2. Save as many as possible: How to reduce selectivity in the record linkage process
Ms Christin Czaplicki (German Pension Insurance)
Ms Dina Frommert (German Pension Insurance)
Mrs Anne Langelüddeke (German Pension Insurance)
Ms Dagmar Zanker (German Pension Insurance)

Linking survey data with administrative data poses several challenges. In Germany, one of the main obstacles in the data linkage process are strict regulations on data confidentiality. The regulations require explicit consent of the respondent, which is most often gathered as written consent including a signature. Since respondents are cautious when their data records are concerned, usually a substantial part of the sample does not consent. Compared to survey data without linkage, linked data are affected of more sources of selectivity, which could influence data quality.
The proposed paper describes different sources of selectivity and suggests ways of minimizing them. Apart from well-researched sources of bias, which are inherent to the survey process, we suggest to consider three further sources of selectivity, which are more technical and are introduced by the record linkage process.

1. As mentioned above, not every respondent will consent to the use of the administrative records held in their name.
2. Once the respondents have given their consent, they have to be identified in the administrative data.
3. Once they are identified, their records have to be extracted from the administrative data base, and finally
4. the survey data has to be linked to the extracted administrative records of the same person.

With each of these steps there are potential difficulties leading to dropouts and an increase in selectivity of the sample. Measures to reduce selectivity have so far concentrated on the step of consent. They include placement and wording of the consent form, or potential interviewer effects. The topics covered in the survey and trust in the survey agency and administrative institutions have been identified to play a role as well.
However, after the respondent has given consent, there is still no guarantee that their details can be verified. They might not give crucial information needed to identify their administrative records or their handwriting might be illegible. In addition, even once the consent data have been cleaned, it is still possible that the administrative records do not correspond to the survey respondent.
On the other hand, the data extraction might not be possible due to administrative reasons.
Since these, more technical reasons for dropouts and selectivity have so far been overlooked and in order to reduce selectivity, it is important to pay more attention to these technical sources of bias.


3. Innovative Applications for Linking Health Survey Data to Vital and Administrative Data
Ms Lisa Mirel (CDC/NCHS/OAE/SPB)
Mr Cordell Golden (CDC/NCHS/OAE/SPB)
Mr Marc Roemer (AHRQ/CFACT)

Linked survey and administrative data can be used to facilitate richer analyses by augmenting the information collected from the surveys with vital or administrative data. However, the quality of linked data is only as good as the algorithm used to produce them. Linkage methodologies must be rigorous and transparent so that analyses are valid and replicable. The National Center for Health Statistics (NCHS), the principal health statistics agency in the U.S., has a data linkage program that is designed to expand the analytic utility of the Center's population-based surveys. The NCHS Data Linkage Program links its health survey data with vital statistics and administrative data sources. However, there has been a growing reluctance of survey participants to provide personally identifiable information (PII) to interviewers. Therefore, in recent years, changes to survey design have been implemented to reduce the amount of PII collected. This, in turn, has limited the information available for data linkages based on strictly deterministic matching algorithms. To address this issue, the Data Linkage Program at NCHS has altered some of their linkage methodologies to add more probabilistic approaches.

This talk will describe some of the new approaches being used for linking when limited PII is available, such as new match weights to be used in the scoring algorithms. Results will compare new and old methodologies, using actual examples from the NCHS Data Linkage Program. The results will be discussed in terms of implications for analyses and future directions of the Data Linkage Program.