All time references are in CEST
Safe research data sharing: disclosivity and sensitivity issues 2
|Session Organisers|| Dr Aida Sanchez-Galvez (Centre for Longitudinal Studies, UCL Social Research Institute)
Dr Vilma Agalioti-Sgompou (Centre for Longitudinal Studies, UCL Social Research Institute)
Dr Deborah Wiltshire (GESIS – Leibniz Institute for the Social Sciences)
|Time||Tuesday 18 July, 14:00 - 15:30|
A core activity of many surveys is the safe provision of well-documented and de-identified survey data, linked administrative data and geographical data to the research community. Data sharing is based on the consent given by the participants and is conditional on the assurance that confidentiality and GDPR rights will be protected. Breaking this assurance would constitute an ethical violation of consent and would threaten the trust that survey participants place in the team who collects their data and may affect their willingness to participate in further data collections.
Data sharing policies and applications are generally overseen by Data Access Committees. Data releases are either managed by the studies themselves, or by national repositories. The choice of data access routes usually depends on the disclosivity and sensitivity of the data. Data are considered disclosive if there are concerns over the re-identification of individuals, households, or organisations by a motivated intruder, and are considered sensitive if they fall under the GDPR definition of “special category data”, which require additional protection. Disclosive and sensitive data require a higher degree of security and are generally only available in secure sharing platforms, such as local secure servers or Trusted Research Environments (TREs).
The aim of this session is to create a space to share ideas and techniques on data access and how to address the risk of disclosivity and sensitivity. We invite colleagues to submit ideas relating to:
• Data sharing routes for survey and linked data
• Methods of disclosure control prior to data sharing
• Methods of risk assessment of disclosivity and sensitivity
• Data classification policies and sharing agreements
• Technical tools used to generate bespoke datasets
• Trusted Research Environments / Secure Labs: remote vs in-person access
• Syntax sharing and reproducibility
• International data sharing
Papers need not be restricted to these specific examples.
Keywords: sharing, disclosivity, sensitivity, safe access, disclosure control
Dr Marieke Heers (FORS, Swiss Centre of Expertise in the Social Sciences) - Presenting Author
Dr Brian Kleiner (FORS, Swiss Centre of Expertise in the Social Sciences)
Dr Alexandra Stam (FORS, Swiss Centre of Expertise in the Social Sciences)
The sharing of data and related materials is more and more required from scientific journals. As such, reproducibility of research and scientific analyses is becoming increasingly important for survey researchers, where they must make available the various materials used for their scientific articles. These materials are usually shared via repositories and include the data collection instruments, the data themselves, a proper documentation, as well as the syntaxes used for the analyses.
The benefits to researchers of sharing data and related materials linked to scientific publications are considerable, including greater visibility of one’s own research and data, as well as reinforced trust and confidence in one’s conclusions. Further, some universities are moving towards including data citation as part of the assessment of research impact, and there is evidence that articles for which data are shared are more frequently cited.
However, researchers often struggle in practice to share their data and materials. Challenges include issues concerning proper informed consent, anonymisation, copyright, adequate documentation, and data security. In addition, data citation practice, which allows readers to link from the article to the data in a repository, is often unclear, with little guidance from journals.
We put forward that the data services of repositories or universities have a key role to play in this regard, since they often provide the support and tools to researchers so that they can properly share their data and related materials. Their activities often cover the full research cycle and, amongst others, relate to questions of anonymisation, data citation, and documentation. This contribution will aim at fostering a discussion of the current challenges facing survey researchers regarding the sharing of data and related materials, as well as the needed forms of support that could be brought to bear or developed further by data services.
Mr Urs Fichtner (Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center – University of Freiburg) - Presenting Author
Dr Lukas Horstmeier (Institute of Medical Biometry and Statistics, Section of Health Care Research and Rehabilitation Research, Faculty of Medicine and Medical Center – University of Freiburg)
Dr Boris Bruehmann (Institute of Medical Biometry and Statistics, Section of Health Care Research and Rehabilitation Research, Faculty of Medicine and Medical Center – University of Freiburg)
Mr Manuel Watter (Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center – University of Freiburg)
Professor Harald Binder (Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center – University of Freiburg)
Mr Jochen Knaus (Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center – University of Freiburg)
One of the currently debated changes in scientific practice is the implementation of data sharing requirements for peer-reviewed publication to increase transparency and intersubjective verifiability of results. Therefore, both funding agencies and scientific journals try to promote the publication of research data. However, it seems that data sharing is a not fully adopted behavior among researchers. The Theory of Planned Behavior was repeatedly applied to explain drivers of data sharing from the perspective of data donors (researchers). Furthermore, data sharing can also be understood as disclosure of personal information, e.g. from the perspective of survey participants. This study aimed to answer the following questions:
1 Is participants non-response affected by the information about the sharing of the collected data?
2 Is participants response behavior affected by the information about the collected data to be shared?
We applied a mixed methods approach, consisting of a qualitative pre-study and a quantitative survey including an experimental component. Latter was a two-group setup with an intervention group (A), receiving the information that data will be shared publicly, and a control group (B). The survey included questions on views and experiences regarding data sharing. A list-based recruiting of members of the Medical Faculty of the University of Freiburg was applied for 15 days. For exploratory data analysis of dropouts and non-response, we used Fisher’s exact tests and binary logistic regressions.
In sum, we recorded 197 cases for Group A and 198 cases for Group B. We found no systematic group differences regarding response bias or dropout, indicating no major effect of the information that the collected data will be shared publicly. Furthermore, we gained insights on the experiences, our sample made with data sharing: half of the sample already requested data of other researchers or shared data on request of other researchers. Data repositories, however, were used less frequently: 28% of our respondents used data from repositories and 19% stored data in a repository.
Survey response bias, data sharing, dropout rate, researcher behavior, data publication
Dr Aida Sanchez-Galvez (Centre for Longitudinal Studies, University College London) - Presenting Author
Ms Claudia Yogeswaran (Centre for Longitudinal Studies, University College London)
Dr Vilma Agalioti-Sgompou (Centre for Longitudinal Studies, University College London)
Sharing survey data for research purposes must be governed by principles and procedures that seek to be fair, open, and transparent. There is a balance to be drawn between maximising the use of the research data and minimising risks to the rights of participants.
The UCL Centre for Longitudinal Studies (CLS) is home to four national longitudinal cohort studies, which follow the lives of tens of thousands of people in the UK. The data collection, linkage, management and sharing is based on the consent given by the participants.
CLS has established a data sharing programme that aims at ensuring that the CLS data are as widely available as possible to the research community (nationally and internationally), whilst guaranteeing that: i) sensitive and/or disclosive data are managed and shared in a secure manner; ii) the legal requirements, ethical guidelines, and moral responsibility to the study participants are maintained; and iii) the consent agreements given by the cohort members are complied with. Attempts to re-identify individuals in the research data is always forbidden.
In this paper we will describe the CLS tiered data classification, which determines the most appropriate data sharing route and licencing needed. The main criteria for data categorisation are sensitivity and disclosivity risk, which are assessed in depth by the CLS data management team. The four CLS data tiers are: 1) Tier 1a: safeguarded data of low sensitivity and a small residual disclosivity risk; 2) Tier 1b: special safeguarded data of slightly elevated sensitivity and a small residual disclosivity risk; 3) Tier 2: controlled access data of high sensitive nature or with a significant disclosivity risk; Tier 4: controlled access data with a very high level of sensitivity and/or disclosivity risk. Data from tiers 2 and 3 must be accessed from Trusted Research Environments.