ESRA logo

ESRA 2023 Preliminary Glance Program


All time references are in CEST

Safe research data sharing: disclosivity and sensitivity issues 3

Session Organisers Dr Aida Sanchez-Galvez (Centre for Longitudinal Studies, UCL Social Research Institute)
Dr Vilma Agalioti-Sgompou (Centre for Longitudinal Studies, UCL Social Research Institute)
Dr Deborah Wiltshire (GESIS – Leibniz Institute for the Social Sciences)
TimeWednesday 19 July, 09:00 - 10:30
Room U6-28

A core activity of many surveys is the safe provision of well-documented and de-identified survey data, linked administrative data and geographical data to the research community. Data sharing is based on the consent given by the participants and is conditional on the assurance that confidentiality and GDPR rights will be protected. Breaking this assurance would constitute an ethical violation of consent and would threaten the trust that survey participants place in the team who collects their data and may affect their willingness to participate in further data collections.
Data sharing policies and applications are generally overseen by Data Access Committees. Data releases are either managed by the studies themselves, or by national repositories. The choice of data access routes usually depends on the disclosivity and sensitivity of the data. Data are considered disclosive if there are concerns over the re-identification of individuals, households, or organisations by a motivated intruder, and are considered sensitive if they fall under the GDPR definition of “special category data”, which require additional protection. Disclosive and sensitive data require a higher degree of security and are generally only available in secure sharing platforms, such as local secure servers or Trusted Research Environments (TREs).
The aim of this session is to create a space to share ideas and techniques on data access and how to address the risk of disclosivity and sensitivity. We invite colleagues to submit ideas relating to:
• Data sharing routes for survey and linked data
• Methods of disclosure control prior to data sharing
• Methods of risk assessment of disclosivity and sensitivity
• Data classification policies and sharing agreements
• Technical tools used to generate bespoke datasets
• Trusted Research Environments / Secure Labs: remote vs in-person access
• Syntax sharing and reproducibility
• International data sharing
Papers need not be restricted to these specific examples.

Keywords: sharing, disclosivity, sensitivity, safe access, disclosure control

Care to Share? Investigating Determinants of Code Sharing Behavior in the Social Sciences

Mr Daniel Krähmer (LMU Munich) - Presenting Author
Ms Laura Schächtele (LMU Munich)
Dr Andreas Schneck (LMU Munich)

Transparency and peer control are cornerstones of good scientific practice and entail the replication and reproduction of findings. Scientific accountability, however, rests on the premise that researchers make their data and research code publicly available. To this day, sharing research material seems to be the exception rather than the rule among social scientists.
To investigate which specific factors increase or inhibit researchers’ willingness to share code, we conducted a field experiment sending code requests to 1206 corresponding authors who have published research articles based on data from the European Social Survey between 2015 and 2020. In this experiment, we randomly varied three aspects of our email’s wording in a 4x2x2 factorial design: the appeal why researchers should share their code (FAIR principles, academic altruism, prospect of citation, no information), the perceived effort of code sharing (no code cleaning required, no information) and our request’s overall framing (enhancement of social science research, response to replication crisis). This experiment has been preregistered on OSF.
Contrary to our hypotheses, only framing affected researchers’ code sharing behavior: Scientists who received the negative wording alluding to the replication crisis were significantly more likely to share their research code. Overall, 385 researchers (37.5% of eligible cases) responded positively to our request, mirroring the degree of code availability in other disciplines.
Our study makes three key contributions. First it provides a large-scale aggregate assessment of code sharing behaviour among social scientists using survey data. Second it pins down the causal influence of specific factors on code sharing. Third it unveils rather alarming anecdotal evidence on the challenges researchers face when sharing code (e.g., poor code management). Together, these results point towards the dire need for institutional solutions regarding code availability.


SANE - Secure ANalysis Environment

Mr Lucas van der Meer (ODISSEI) - Presenting Author

Privacy, copyright, and competition barriers limit the sharing of sensitive data for scientific purposes. We propose the Secure Analysis Environment (SANE): a virtual container in which the researcher can analyse sensitive data, and yet leaves the data provider in complete control. By following the Five Safes principles, SANE will enable researchers to conduct research on data that up until now are hardly available to them.

SANE comes in two variants. Tinker SANE allows the researcher to see, manipulate and play with the data. In Blind SANE, the researcher submits an algorithm without being able to see the data and the data provider approves the algorithm and output.

SANE uses concepts from the CBS Remote Access Environment, ODISSEI Secure Supercomputer and SURF Data Exchange, to build a generic off-the-shelf solution to be used by any sensitive data provider and researcher. SANE can be used by researchers in any discipline, as illustrated by the involvement of consortia in both the social sciences (ODISSEI) as well as humanities (Clariah). SANE is fully embedded into the Dutch national E-infrastructure provider SURF.

Potential sensitive data providers include the Dutch Chamber of Commerce (KvK), Funda, National Library of the Netherlands (KB) and Netherlands Institute for Sound and Vision (NISV).

By the time of the conference, SANE will also be ready for a demo, if the organising committee wants so.


Crossing borders without leaving – sharing secure data internationally

Ms Beate Lichtwardt (UK Data Service/UKDA, University of Essex)
Dr Deborah Wiltshire (GESIS - Leibnitz Institute for the Social Sciences) - Presenting Author

Due to their sensitive nature, access to secure/controlled data is highly restricted, they are only accessible remotely via Trusted Research Environments (TREs) or on-site via dedicated Safe Rooms.

Internationally, researchers often face significant hurdles in terms of time and financial burden having to travel to a Safe Room, a hurdle not many could overcome. In recent years, work has intensified to enable data access across international borders (IDAN, SSHOC) via Safe Room Remote Access. One of the achievements of SSHOC WP 5.4 was to set up a bilateral agreement between the UK Data Service and GESIS, Germany allowing a remote connection between the two services.

Remote connections offer a safe environment to access confidential data. Datasets remain on the secure servers of the data provider (in location A) and are accessed via a secure encrypted internet connection (from location B) where all analysis is done. No physical transfer of the data ever occurs. The Safe Room at location B provides additional physical controls (e.g. Safe Room access and monitoring).

This agreement allows researchers in Germany to access a range of secure datasets made available by the UKDS SecureLab from the GESIS Safe Rooms in Germany and vice versa. The agreement is available as a template to help facilitate the setting up of further remote connections. Once the legal agreements were in place, the technical set-up started. Thin Clients were configured, exchanged between partners and installed in the Safe Room of the other service. Technical tests were carried out, followed by researcher tests.

Our talk will showcase the connection set up UK Data Service - GESIS and highlight some of the key challenges as well as solutions which we found along the way.


The UK Longitudinal Linkage Collaboration: a trusted research environment providing centralised access to sensitive data for the longitudinal research community.

Mr Andy Boyd (University of Bristol) - Presenting Author
Ms Robin Flaig (University of Edinburgh)
Dr Jacqui Oakley (University of Bristol)
Mr Richard Thomas (University of Bristol)
Dr Katharine Evans (University of Bristol)
Mr Matthew Crane (University of Bristol)
Ms Kirsteen Campbell (University of Edinburgh)
Dr Stela McLachlan (University of Edinburgh)
Dr Emma Turner (University of Bristol)

Objectives
The ‘Longitudinal Health and Wellbeing National Core Study’ uses Longitudinal Population Study (LPS) data to inform COVID-19 research. A centralised infrastructure for LPS was needed to systematically link participants’ routine records and provide secure researcher access.

Approach
The UK Longitudinal Linkage Collaboration (UKLLC) is a pan-UK and interdisciplinary Trusted Research Environment (TRE). It provides remote access to pooled LPS data using a ‘Secure eResearch Platform’. UKLLC’s design was informed by LPS data managers and public contributors. To access participants’ records it needs to demonstrate effective disclosure controls in order to gain regulatory approvals and maintain the study-participant trust relationship.

Results
Over 20 LPS have joined UKLLC, with >280,000 Participants' self-reported data linked to NHS records (primary, secondary, COVID-19; civic registers; prescriptions; mental health) and environmental exposure data (pollution, green space, neighbourhood indicators). Permissions are in place to link socio-economic records (earnings, employment, social benefits, education). Our governance model is based on the ‘Five Safes’ and UK Anonymisation Network’s ‘Anonymisation Decision Making Framework’ (ADF) to assess and control disclosure risk. Our system enables the complete separation of processing participants’ identifiers for linkage purposes from de-identified attribute data for analysis. Disclosure risk is assessed using the ADF which informs protocols to de-identify data prior to entering the TRE. Following analysis, analysts submit anonymous aggregate statistical findings and reusable research outputs (syntax, derived data, documentation) for disclosure review assessments. Public/participant involvement guides our approach to understanding data sensitivity and risk tolerance.

Conclusion
UKLLC provides a unique strategic research-ready platform for longitudinal research. It offers a scalable solution for rapid access to sensitive data from multiple data owners whilst maintaining confidentiality through controlling disclosure risk. It is compliant with key UK regulations needed to access linked routine records.