Friday 17th July, 11:00 - 12:30 Room: HT-103

Technical Problems and Solutions for Record Linkage and Big Data 2

Convenor Dr Manfred Antoni (Institute for Employment Research (IAB) )
Coordinator 1Mr Stefan Bender (Institute for Employment Research (IAB))
Coordinator 2Professor Rainer Schnell (University of Duisburg-Essen)

The scope of the session includes technical issues of linkage, handling large administrative databases or big data (for example, blocking strategies) and problems caused by incomplete identifiers. Furthermore, techniques and problems of privacy preserving record linkage and secure access to linked datasets will be discussed. Finally, new algorithms and software for record-linkage applications for large datasets will be covered.

We invite presentations on:
• Handling missing and messy identifiers
• Blocking techniques
• Privacy Preserving Record Linkage
• Access to linked datasets
• Algorithms and Software

1. The Generations and Gender Programme: The legal challenges in combining Survey data, Administrative Data and Registry data and how we are overcoming them
Mr Thomas Emery (NIDI)

The Generations and Gender Programme is a cross-national survey that has been conducted in 19 countries. Its primary focus is on families and relationships over the life course and has over 2,000 registered users. The use of administrative data to supplement GGP survey data has been an overwhelming success in many regards but there are significant obstacles to releasing the data for public use. The GGP is striving to develop techniques to address these obstacles. After much trial and error, the GGP has developed a series of solutions that appear to satisfy the concerns of users and providers.

2. Quality, analytic potential and accessibility of linked administrative, survey and publicly available data
Dr Manfred Antoni (Institute for Employment Research (IAB))
Ms Alexandra Schmucker (Institute for Employment Research (IAB))

Surveys increasingly face the problem of unit-nonresponse. Quality issues arise with item-nonresponse or misreporting. Longitudinal interviews lead to high costs and response burden.
One remedy is the linkage with other data sources. Data linkage potentially results in increased cost efficiency, data quality and analytic potential for substantive analyses. Research on data quality gets possible by applying validation or nonresponse analyses.
Our presentation will focus on the potential, quality and accessibility of linked data of the Research Data Centre of the German Federal Employment Agency. These data are linked to several survey data sets on individuals, households or establishments.