ESRA 2019 Draft Programme at a Glance

Microdata from Official Statistics

Session Organisers Mr Simon Börlin (GESIS - Leibniz Institute for the Social Sciences)
Dr Klaus Pforr (GESIS - Leibniz Institute for the Social Sciences)
TimeFriday 19th July, 13:00 - 14:00
Room D23

Microdata from Official Statistics (e.g. the national and the EU-Labour Force Survey, Current Population Survey (CPS), national Censuses) are the basis for a wide range of economic and social research in Europe and beyond. Official data are very diverse and can be used in many ways. They exist at the local, regional, national and international level and differ, for example, in regard to design (e.g. cross-sectional and (rotational) panel design) and different main areas of content. Additionally, they are collected in different modes, some are sampled in private households, some use proxy interviewing, and in some official surveys, the participation is obliged by law. In contrast to research surveys, official data differ in that they are not primarily collected for scientific research. Against this background, official data provide a notable basis for methodological questions and substantive issues.

For this session, we welcome contributions on the following research topics (but not limited to):
- Measuring data quality of official data (e.g. mode-, interviewing-effects; effects of different sampling techniques; reliability and validity)
- Cross-country comparability
- Comparisons of official data with other surveys (for methodological and/or substantive research questions)
- Innovative applications with a) panel data, b) data linkage, c) geographically referenced data
- Developments of tools, which improve the handling of official data and/or implement official classifications or concepts of the social science.

Keywords: Official Statistics Microdata

SHARE-RV: Comparing data on education, income and working career from the SHARE-Survey combined with record data from the German Pension Insurance

Ms Tatjana Mika (German Pension Insurance - Research Data Centre) - Presenting Author

Survey data are increasingly often linked with administrative data in order to enhance data quality. Record linkage is thereby assumed to increase reliability especially concerning past events like short term unemployment. Other fields of application are subjects, which are diffi-cult to report for respondents like gross income. However, administrative data have specific problems of their own which result sometimes from inaccurate or incomplete records. Fur-thermore, changes in legal or administrative conditions might cause systematic variation in administrative data which can be difficult to detect. Examples are periods of welfare state re-forms like enlarged or restricted unemployment coverage in times of mass unemployment. Deviation of survey information from administrative data might in other cases nevertheless be justified by different measurement concepts (survey question vs. administrative procedure and/or logic). In these cases, the assumption of an error depends on the research question.

The ongoing project “SHARE-RV”, which started in 2008-2009, asks the German participants of the international SHARE survey to agree to a link of their SHARE interview with data from their pension insurance record. The project has continued since 2009 and published several Scientific Use Files combining survey data with record data, which can be ordered for the use in universities and scientific institutions.

SHARE-RV Data are used to present differences in data on education, professional career and income from social security. Process produced data from the German Pension Fund offer life-course information and details about pension calculations. Data on education and the professional career are also registered. Data with similar content (education, working career and income) are also available from the SHARE Survey. The presentation offers insights about the differences of both sources especially in regard to data quality.

Big Data Meets Big Survey: Integrating Administrative Records into the American Community Survey

Dr Jennifer Ortman (U.S. Census Bureau) - Presenting Author
Ms Sandra Clark (U.S. Census Bureau)

The changing landscape of America’s communities yields new and complex challenges for survey and census takers. Survey organizations are also confronting reduced response rates, obsolete data collection modes, and increasing reluctance among respondents to provide personal information. In a world where information is readily available and multiple sources can be easily linked together, survey organizations are seeking out alternatives to aid in data collection and improve data quality. The U.S. Census Bureau has made significant progress exploring the use of administrative records in the American Community Survey (ACS) to continue to meet data needs in an era of limited budgets, rising costs, and decreasing participation. Incorporating administrative records into our processes should positively impact respondent burden and data reliability, while saving costs by, for example, reducing the need for follow up visits. There is great potential for administrative record utilization in data collection and processing, but there are also great challenges. These include matching accuracy, geographic coverage, and a mismatch between administrative concepts and statistical requirements. This paper details the vision of how administrative data will be integrated into the ACS, including an evaluation of alternative administrative data sets, a case study on the use of administrative data to replace ACS housing items, and the use of administrative data for editing and imputation on the ACS.

Releasing microdata via public use files while protecting respondents’ confidentiality

Ms Neeraja Sathe (RTI International) - Presenting Author
Dr Feng Yu (RTI International)
Ms Lanting Dai (RTI International)

Researchers and policymakers increasingly demand access to readily available microdata. To respond to such demands, statistical agencies produce public use files (PUFs) to allow users to access microdata and to perform statistical analyses. Examples of PUFs in the United States include American Community Survey Public Use Microdata Sample files from the U.S. Census Bureau, National Health and Nutrition Examination Survey data from the National Center for Health Statistics, and National Survey on Drug Use and Health (NSDUH) data from the Substance Abuse and Mental Health Services Administration.

Agencies must balance the availability of microdata for analysis with pledges of confidentiality to survey respondents. This presentation provides an overview of disclosure risks in releasing microdata, as well as inside and outside intrusion scenarios. It also reviews commonly used deterministic or perturbative statistical disclosure limitation (SDL) techniques used in creating PUFs, such as suppression, recoding, swapping, and substitution.

NSDUH microdata PUFs are used as a case study in this presentation. For NSDUH specifically, a combined masking technique called MASSC was developed at RTI International to produce PUFs. MASSC stands for Micro-Agglomeration, Substitution, Subsampling, and weight Calibration and is an SDL method that can be used for simultaneously protecting data confidentiality while maintaining data utility. This presentation discusses how MASSC is implemented and how several methods are used to assess the quality and utility of NSDUH PUFs with respect to the survey’s annual restricted-use data file.