ESRA logo
Tuesday 18th July      Wednesday 19th July      Thursday 20th July      Friday 21th July     




Tuesday 18th July, 16:00 - 17:30 Room: F2 102


Administrative data linking: why not?

Chair Dr Paula Devine (ARK, Queen's University Belfast )
Coordinator 1Professor Gillian Robinson (ARK, Ulster University)

Session Details

The linking of survey and administrative data is an exciting development within survey research. It provides opportunities to learn more about our population and society in ways that were previously unimaginable. Unsurprisingly, analysts have spent much time developing anonymisation, linking and statistical techniques to exploit these data.

However, the linking of administrative and survey data is not a problem-free process, since it takes place within specific political, social and cultural contexts. Thus, this session will explicitly focus on the ethical and governance issues related to data linkage and sharing.

Papers of interest in this session may explore:
• Ethical issues relating to the linking of survey and administrative data;
• Public knowledge and understanding of public and private data;
• Public understanding of, and support for, data linkage;
• Governance arrangements for data linking, and examples of good or bad practice;
• Case studies of research based on linked survey and administrative data, highlighting ethical and governance lessons for others embarking on this process.

Papers focusing solely on the methodological and statistical issues involved in administrative data research are directed to other sessions.

Paper Details

1. Exploring usage of administrative data in social science research
Dr Tom Emery (NIDI)

Linking administrative data and survey data offers the potential for many new research opportunities for scientific and policy-related projects. From a rapid assessment of the academic literature addressing record-linkage in one way or another, it can be concluded that the number of studies based on linkage of records of data systems has been growing, but that actual numbers of studies are small, especially for social science related studies. Such studies are far more common in public health research. The paucity of linkage studies cannot solely be attributed to technical issues. Over time, deterministic and probabilistic record-linkage methods and software have been developed permitting linkage of records with or without a unique person-identifier numbers. (e.g. Groenewold, van Ginneken, & Masseria, 2008; Künn, 2015; Lifang G., Baxter, Vickers, & Rainsford, 2003; Sayers, Ben-Shlomo, Blom, & Steele, 2015). The literature indicates that cost considerations are as important as linkage and manual verification of the quality of linkages involves costly software and personnel. Technically speaking, the success in matching and linking records depends also on the kind of disclosure limitation method used to protect confidential information contained in the data (e.g. Itoh & Takano, 2011). Another major factor is the concern and fear of government institutions about the risk of breaching individual privacy laws. Consent of residents about sharing private information in administrative data is usually not available, so this problem needs to be overcome. Such legal constraints on the use of administrative data seriously limits the development of and access to linked data sets. One strategy for doing this is by explicitly asking survey respondents to give such kind of consent during a survey interview (e.g. Emery, 2016; Künn, 2015). Several studies urge policymakers, who have a lot to gain from the findings of social science research using linked data, to help create a better procedural infrastructure in support of data linkage projects for scientific research (Künn, 2015). In this report we provide an overview of existing administrative data linkage projects in the social sciences with a focus on those that provide data for secondary use by third parties. We elaborate on the general trends identified by looking at four case studies from SERISS partners (GGP, UKDA, ESS, SHARE) which conducted some form of data linkage. These case studies not only highlight successes in data linkage but also the failures. This helps bring the obstacles and limitations of data linkage to the forefront of discussion and clear conclusions for policy makers in the field of data science.


2. Administrative Data Linkage: The Data Archive Perspective
Dr Peter Granda (University of Michigan)

For many years social science data archives provided survey data and, to a much lesser extent, administrative data to the research community. This dissemination activity focused on one central principle: the availability of public-use files devoid of any direct or indirect identifiers to protect the confidentiality of the individual. Under these conditions, data linkage might only apply when joining different waves of the same survey for longitudinal analyses. Archives strongly discouraged the acquisition of data files with personal identifiable information. However, the growing amount of administrative and other types of contextual data that can now more easily be linked to survey data combined with the expanding research agenda of social scientists fundamentally changes this model. How have data access options changed for researchers and how do archives meet the increasing challenges of supplying them with the data they want in a way that is most convenient for them to use?

This presentation will describe the experience of the Inter-university Consortium for Political and Social Research (ICPSR), a large data archive housed at the Institute for Social Research at the University of Michigan as it copes with the ethical and governance issues surrounding the balance between preserving individual confidentiality and the increased desire for data linkage within the research community.

It will address such questions as:

• What role do consent forms play as an ethical issue in determining when it is appropriate to link survey data with administrative data?

• What rights do individuals have regarding their presence in administrative data for which they neither provided consent nor assumed that the data were ever collected for research purposes?

• What governing structure can data archives construct to provide access to linked data while still maintaining confidentiality?

• What procedures and practices have archives attempted to date to provide such access and what lessons can be learned from this experience?

• What new strategies are on the horizon to assure that researchers doing analyses on linked data do so in a manner that protects both survey respondents and administrative records?


3. Public understanding of administrative data linking in Northern Ireland
Dr Paula Devine (ARK, Queen's University Belfast)
Professor Gillian Robinson (ARK, Ulster University)

Analysis of administrative data has huge potential in its own right. It could be even more valuable if these data could be shared or combined with survey data for research or service provision. Significantly, though, the huge benefits of such linkage has been countered by the public’s concern about their privacy.

This paper will report on a survey of public attitudes to data sharing in Northern Ireland with an emphasis on health issues, carried out as part of the 2015 Northern Ireland Life and Times survey (NILT). 1202 adults across Northern Ireland took part in the survey. The findings indicate that public support for data sharing is linked to trust in specific organisations; data protection measures and the perception of public benefit. For example, the data indicate that there is huge public goodwill to achieve the potential benefits of data linkage, especially if there is a benefit to society. At the same time, however, the overwhelming majority of respondents agree that the right to privacy has to be respected over everything else. Other findings of the survey relate to public understanding of the need for consent.

Given the ethical issues surrounding administrative and survey data linkage, understanding and addressing public concern is vital, since it is the public’s data that researchers are linking.


4. Understanding young people’s views about consenting to data linkage: findings from the PEARL qualitative study
Mr Andy Boyd (University of Bristol)
Dr Suzanne Audrey (University of Bristol)
Dr Lindsey Brown (Independent Researcher)
Professor Rona Campbell (University of Bristol)
Professor John Macleod (University of Bristol)

Background: Electronic administrative data exist in several health and social domains which, if linked, are potentially useful for research. However, benefits derived from linking personal data need to be considered alongside the risks. Objective risks include the threat to privacy, while subjective risks relate to notions of stakeholder acceptability. If data linkage is considered unacceptable, then this threatens the survey-participant trust relationship, and the willingness of data generators to share data.

The Avon Longitudinal Study of Parents and Children (ALSPAC) is a birth cohort study with detailed health, biological and behavioural data from before birth to early adulthood. The Project to Enhance ALSPAC through Record Linkage (PEARL) was established to obtain consent for, and establish mechanisms of, linkage between ALSPAC index participants and routine sources of health and social data. Qualitative research was incorporated in the PEARL study to examine participants’ views about data linkage and inform approaches to information sharing.

This paper describes findings relating to participants understanding and views on the importance of consent. It will also briefly illustrate the ways in which – as part of the CLOSER cohort consortium - we have used these findings to lobby UK government departments to provide researchers with access to linked records, and to inform work to improve public understanding as part of national education initiatives.

Methods: Digitally recorded interviews were conducted with 55 young people aged 17 to 19 years, 56% of whom were female. Key terms and processes relating to consent and data linkage were explained to interview participants as well as anonymisation strategies. Four scenarios prompted consideration of linking different types or sources of data, and whether consent should be requested. All interview recordings were fully transcribed. Thematic analysis was undertaken using the Framework approach to data management. Findings were distributed to key stakeholders to inform national debate.

Results: Scenarios relating to teenage pregnancy and mental health elicited unease about individuals being stigmatised or blamed, while scenarios relating to heart disease and asthma tended to be seen as having a clearer purpose and health outcome, suggesting a preference for research with tangible health benefits. Anonymising data was not regarded as sufficient for researchers to do whatever they want with the data. This was linked to notions of ‘ownership’ of personal data and lack of clarity about the extent to which individuals would or could be de-identified. Young people raising the same issues came to differing conclusions about whether consent was needed.

Conclusions: Accommodating these views within a governance framework that is acceptable to a majority of the public is challenging. Pragmatic, imaginative and flexible approaches are needed if research using data linkage is to successfully realise its potential for public good without undermining public trust in the research process. Robust evidence such as these are able to inform national strategies to improve the understanding and acceptability of data linkage.