ESRA logo
Tuesday 18th July      Wednesday 19th July      Thursday 20th July      Friday 21th July     




Thursday 20th July, 14:00 - 15:30 Room: F2 102


Administrative Records for Survey Methodology 5

Chair Dr Asaph Young Chun (US Census Bureau )
Coordinator 1Professor Mike Larsen (George Washington University)
Coordinator 2Dr Ingegerd Jansson (Statistics Sweden)
Coordinator 3Dr Manfred Antoni ( Institute for Employment Research)
Coordinator 4Dr Daniel Fuss (Leibniz Institute for Educational Trajectories)
Coordinator 5Dr Corinna Kleinert (Leibniz Institute for Educational Trajectories)

Session Details

Incorporation of administrative records have long been regarded as a way of improving the quality and interpretability of surveys and censuses and of controlling the rising cost of surveys (Chun and Scheuren, 2011). The increasing number of linked datasets, such as Health and Retirement Study in the U.S., National Educational Panel Study in Germany, and Understanding Society in UK, are accompanied by growing empirical evidence about the selectivity of linked observations. The extent and pace of using administrative data varies from continent to continent and from country to country. This is partly due to differential concerns about privacy, confidentiality, and legal constraints, as well as variability in acceptance and implementation of advances in statistical techniques to control such concerns.

The primary goal is to control data quality and reduce total survey error. This session will feature papers that implement "total administrative records error" and “total linked data error” methods and provide case studies and best practices of using administrative data tied to the survey life cycle (Chun and Larsen, a forthcoming Wiley book). The session invites papers that discuss fundamental challenges and recent advancements involved in the collection and analysis of administrative records, integration with surveys, censuses, and auxiliary data. We also encourage submission of papers discussing institutional collaboration on linked data, sustainable data access, provision of auxiliary tools and user support. For example, papers in this session include, but are not limited to the following topics:


1.Innovative use of administrative data in household surveys and censuses to improve the survey frame, reduce nonresponse follow-up, and assess coverage error.

2.Quality evaluations of administrative data and quality metrics for linked data

3.Recent advancements in processing and linking administrative data with survey data (one-to-one) and with multiple sources of data (one-to-many).

4.Recent methods of disclosure limitation and confidentiality protection in linked data, including linkages with geographical information.

5.Bayesian approaches to using administrative data in surveys, censuses, small area estimation, and nonresponse control.

6.Implementation of new tools that facilitate the use of linked data by simplifying complex data structures or handling inconsistent information in life-course data

7.Strategies for developing and maintaining a user-friendly infrastructure for the analysis and dissemination of linked data and solutions for collaboration

8.Applications that transform administrative data into information that is useful and relevant to policymaking in public health, economics, science and education.

Paper Details

1. Combining Administrative and Survey Data to Improve Both Surveys and Policy
Professor Pablo Celhay (Pontificia Universidad Católica de Chile)
Professor Bruce D. Meyer (University of Chicago, NBER)
Mr Nikolas Mittag (CERGE-EI)

Recent projects have linked administrative microdata to surveys that provide key information to policy makers, such as the Current Population Survey and the American Community Survey. This chapter reviews how combining administrative and survey data can improve the information on which policy makers base their decisions, both indirectly thorough improvements of survey accuracy and by directly using linked data to examine policy relevant questions. We first provide an overview of how linked survey and administrative datasets can be used to improve surveys by reducing survey error. Linked data can help to assess and potentially correct errors and biases arising from coverage error, unit and item nonresponse, imputation and measurement error. We review the evidence on each error source in turn, drawing examples from our work measuring program receipt and the income distribution. We discuss weighting, imputation, and direct substitution as possible solutions to the identified problems. The second part of this chapter discusses how linked data can help to examine policy relevant questions directly. We review the role linked data can play in studying the effects of social insurance and government transfer programs. We focus on the effects of program features on program participation, the income distribution and labor supply.


2. Enhancement of health surveys with data linkage
Mr Cordell Golden (National Center for Health Statistics)
Mrs Lisa Mirel (National Center for Health Statistics)

As the principal health statistics agency for the U.S., the National Center for Health Statistics (NCHS) is responsible for collecting accurate, relevant, and timely data related to health. The mission of NCHS is to provide statistical information that can be used to guide actions and policies to improve the health of the American people. In addition to collecting and disseminating the Nation’s official vital statistics, NCHS conducts several population-based surveys, including the National Health Interview Survey and the National Health and Nutrition Examination Survey, and establishment surveys of health-care facilities, including the National Hospital Care Survey. The data collected through these surveys allow NCHS to publish widely-used, reliable statistics regarding the health status of the U.S. population and selected subgroups.
The data also provide the opportunity to identify disparities in health status and use of health care services by demographic, socioeconomic status, and other population characteristics; describe experiences with the health care system; monitor trends in health status and health care delivery; and evaluate the impact of health policies and programs.
There are many questions that health surveys cannot answer on their own, in part because they often only represent a snapshot in time. In addition, most population-based health surveys rely on respondent reports and, thus, are limited by respondent recall. However, when these data are linked with vital statistics or administrative data, analysts can gain insight into outcomes such as mortality or health care utilization, and methodological issues such as accuracy of respondent-reporting. Thus, data linkages enhance the analytic capabilities and scientific value of health surveys. Over the years, NCHS has developed a data linkage program to link its health survey data with vital statistics data sources, including the National Death Index, and administrative data sources, including federal and state benefit programs. Although administrative data are not created for research purposes (they are created primarily for program administration), the NCHS Data Linkage Program has worked extensively with partner agencies to develop data files that can be used for research.
This talk will describe the NCHS Data Linkage Program and how the linked data have helped to inform policy research.


3. Lessons from linked data: Quality of data about income and education from SHARE-RV
Ms Tatjana Mika (German Pension Insurance)
Ms Imke Herold (Munich Institute for the Economics of Aging )

Administrative data are linked the Survey on Health Aging and Retirement in Europe (SHARE) in Germany in order to enrich the survey with selected information from the pension insurance records. This opportunity is increasingly often used. However, the survey data SHARE enable researchers also to evaluate the quality of data from the records. These evaluations are able to show that both sources have particular strengths and weaknesses. Data about gross income are by far more reliable if they come from pension insurances registers. Missings are less common in the registers and the quality of data is superior. While persons tend to round up their income in surveys, records give the number accurately as calculated and used in the official process. Survey data are thus superior in quality concerning the level of school and professional education. Persons are obviously better informed about their education level than their employers from which the record information stems. The papers presents results from the project SHARE-RV which links data from three waves of the survey SHARE with anonymised data from the German Pension Insurance records.


4. An Investigation of Record Linkage Refusal and Its Implications for Empirical Research
Mr Arne Jonas Warnke (ZEW Mannheim)

This study seeks to shed light on the possible reasons for inconsistent findings on predictors of linkage consent, as documented in the literature. To this end, we compare two very similarly structured datasets from the same country. In the two datasets, both of which were collected in surveys conducted by the same polling institute, workers in different establishments were asked questions about work-related aspects relevant for social science research. We first use the same set of controls for both datasets, thereby confirming that the two studies are broadly comparable. Secondly, we add further variables to the datasets to see whether varying the set of controls gives rise to inconsistent results. These additional variables are not necessarily available in both datasets and include psychological attributes as well as job and firm characteristics. In a subsequent step, we make use of the matched employer-employee structure of the available data.
Here, we want to answer the question whether one would have obtained similar results if an analysis had not been restricted to the sample of respondents who provided linkage consent. Suppose our target population consists of all survey participants regardless of their linkage consent decision. Our sample consists of respondents from whom linkage consent was obtained. Applied researchers are mostly concerned whether one can use the sample at hand to derive consistent estimators for statistics of the target population without strong assumptions. This is the case if the association between linkage consent decision and the outcome of interest depends only on observable characteristics. In this case, it is possible to derive consistent population statistics from the sample by adding these observable characteristics to the regression or by weighting. In constrast, it is necessary to make stronger assumption, if unobserved heterogeneity is correlated with linkage consent (and the outcome), stronger assumptions.
We investigate this question by testing whether results for two economic models would have been different if information on individuals who refused to provide linkage consent had in fact been available. In the first model, we estimate an augmented Mincer-regression, a "cornerstone of empirical economics", to see whether different samples give different findings on wage returns to human capital investments. The second model is a replication of our own earlier research on participation in job-related training. In the original study, we excluded information on survey participants who did not provide linkage consent. This is because we made use of linked data only. In this study we replicate our original results using the survey data only. We investigate whether we would have come to different conclusions if we had included information relating to the sample of survey participants who did not provide linkage consent.
In general, our results of the role of (denied) linkage consent for applied research are rather promising. Non-consent does not seem to translate into a large bias in economic models in our two applications. Considering the non-consent sample provides us with virtually the same results.