All time references are in CEST
State of the Metadata Infrastructure
| Mr Knut Wenzig (DIW Berlin/SOEP)
Mr Daniel Bela (LIfBi – Leibniz-Institut für Bildungsverläufe)
Dr Arne Bethmann (SHARE Germany)
|Tuesday 18 July, 11:00 - 12:30
Metadata are at the heart of the movement towards FAIR data (Findable, Accessible, Interoperable, Reusable) and are gaining more and more importance. These metadata need to follow certain standards and need to be collected and managed in appropriate tools throughout the entire survey lifecycle.
Ideally, a data infrastructure can be implemented based on this, which fosters the FAIR principles at all points of data exchange between the involved parties (data producers, providers, archives, users, and other stakeholders):
- Findable: e.g. search portals use standardized metadata to harvest information from data producers and data providers
- Accessible: e.g. data consumers have to be able to access information without human interaction, guided by standardized communications protocols.
- Interoperable: e.g. data users have to be able to understand data and treat and analyze them in an appropriate way
- Reusable: e.g. data is documented in standard, domain relevant way allowing proper secondary data analysis
Metadata are not to be prepared ex-post, but are ideally collected whenever they first appear during the survey life-cycle, e.g. information on the funding institutions during the project proposal phase, data collection protocols and instruments in order to understand the data during survey development, or data alterations during curation. Hence there is a need for proper (meta)data management tools right from the start and through all steps of the process.
This session will discuss contributions to the broader topic of metadata infrastructure within any part of the data lifecycle, and offers space to assess the progress made in this endeavor. We welcome and encourage presentations regarding the implementation of metadata systems–ideally fostering FAIR data provisioning and use–regardless of the systems’ maturity.
Keywords: Metadata FAIR
Dr Claus-Peter Klas (GESIS-Leibniz institute for the social sciences) - Presenting Author
Mr Oliver Hopt (GESIS-Leibniz institute for the social sciences)
Mr Sigit Nugraha (GESIS-Leibniz institute for the social sciences)
Survey programs are often conducted using online survey tools. One of the most prominent tools is LimeSurvey. But developing a new survey with several PIs is not a main feature of LimeSurvey. Based on our questionnaire editor, we present, how to create a new questionnaire, possibly translate it to several languages and export it to LimeSurvey, including all added languages. We also included a structured word import to incorporate existing questionnaires into the questionnaire editor.
Within Limesurvey, the authors need to adopt the exported questions regarding their requirements on order or layout.
When the survey is conducted, LimeSurvey can export the actual data as an SPSS file. This SPSS file can then be uploaded directly within the questionnaire editor. Either, based on the already in the questionnaire documented variable name, the variables from the SPSS file are automatically connected to the questionnaire’s questions or the variables can be connected to the questions manually.
Finally, the questionnaire editor enables to export a complete question- and variable report based on a generic, template-based, document generator. The documentation and the DDI LC 3.2 file can be handed over to a data research infrastructure for long-term preservation and dissemination. In addition, based on our question and study DDI search portal, all questions and variables can be made available for search and re-use. This closes the DDI lifecycle.
Dr Hayley Mills (CLOSER, UCL) - Presenting Author
Mr Jon Johnson (CLOSER, UCL)
Data harmonisation for longitudinal population studies (LPS) involves retrospectively adjusting data collected by different surveys to allow comparisons. Repeating the same analysis across several LPS allows researchers to test whether results are consistent, or differ due to changing or different social conditions. Finding detailed information about data for harmonisation is however resource intensive, with a high level of uncertainty about the possible success.
CLOSER Discovery is a metadata research tool which enables users to discover, explore and assess data. It can help provide assurance of the quality and utility for any potential harmonisation. Information held in CLOSER Discovery such as the question text, the available responses, mode of interview and a consistent vocabulary, can be used to identify comparable data within and across LPS.
CLOSER aims to enhance the metadata further by adding variable concordance, enabling direct comparison of variables within and across LPS. The purpose is to save researchers’ time by providing sufficient information to make decisions about whether the data are suitable for their harmonisation use case. This task is a large undertaking not only in scale, with CLOSER Discovery containing 11 LPS, but poses a significant challenge in how to align variables within a consistent conceptual framework.
This presentation will set out our approach in identifying variables across surveys and how this will be structured to be useful for Discovery users. It will detail the main considerations for determining a workable conceptual framework and we would value input as to whether this is the ideal approach for us and its utility to the research user community.
Ms Jana Nebelin (Deutsches Institut für Wirtschaftsforschung (DIW Berlin)) - Presenting Author
Ms Antonia May (GESIS Leibniz Institute for the Social Sciences)
Dr Pascal Siegers (GESIS Leibniz Institute for the Social Sciences)
Dr Andreas Daniel ( Deutsche Zentrum für Hochschul- und Wissenschaftsforschung (DZHW))
Dr Jan Goebel (Deutsches Institut für Wirtschaftsforschung (DIW Berlin))
Dr Dagmar Kern (GESIS Leibniz Institute for the Social Sciences)
Dr Benjamin Zapilko (GESIS Leibniz Institute for the Social Sciences)
Ms Fakhri Momeni (GESIS Leibniz Institute for the Social Sciences)
Dr Knut Wenzig (GESIS Leibniz Institute for the Social Sciences)
The re-use of research data is an integral part of research practice in the social and economic sciences. To find relevant data, researchers need adequate search facilities. However, a comprehensive, thematic search for research data is difficult because of inconsistent or absent indexing at the social science concept level. Either the data is not documented at a granular level, or primary investigators use their ad-hoc terminology to describe their data. From the user's perspective, the lack of theory language in data documentation impedes effective data searches and thus significantly limits the research potential of existing data collections. Because there is currently no semantic model for indexing the data content, the specific challenge for improving data search lies in establishing concept-based indexing of research data. Research infrastructures need technology for the harmonized semantic indexing of their data. The LORD concept registry aims at closing this gap by developing a registry of sociological and economic concepts and, following the FAIR principles, making this concept registry generally available to the scientific community. As a first step, we developed a basic data model for the Concept Registry using United Modeling Language (UML). All links between are created and managed in the form of so-called RDF triples. An annotation application allows for linking questions/variables to concepts. The application also includes the two SKOS-compliant thesauri, "Thesaurus Social Sciences" (TheSoz) and "Standard Thesaurus Economics" (STW) but could be extended to other resources like ELSST.
We illustrate the application of the LORD concept registry with examples from three large-scale survey programmes (German Socio-Economic Panel, German General Social Survey, National Academics Panel Study). The initial focus is on variables and questions with overlapping content in the three survey programmes, as they form a sound basis for cross-linking with concepts.
Mr Knut Wenzig (DIW Berlin/SOEP) - Presenting Author
Ms Claudia Saalbach (DIW Berlin/SOEP)
Ms Xiaoyao Han (DIW Berlin/SOEP)
An investigation was conducted to examine the extent to which metadata in different Data Documentation Initiative (DDI) standards is openly available and which elements of these standards are used. DDI is a set of international standards for describing and documenting data used in social, behavioural, economic, and health sciences research.
To identify the online repositories, where DDI metadata is available, re3data.org (a global registry of research data repositories that covers research repositories from different academic disciplines) and an enquiry on the DDI-users mailing list were used. We compare this with findings from 2017.
Then we tried to access and analyse the metadata, e.g. by using a standardised protocol like the Open Archive Initiative-Protocol for Metadata Harvesting (OAI-PMH). This makes it possible to show which elements are more commonly used than others.
The findings have implications for deploying DDI metadata and the further development of the standards. They could also inform users like researchers and data stewards, how the standards are used by the community. Overall, the investigation highlights the value of openly available metadata in supporting research to achieve the goals of the FAIR data movement.