ESRA logo
Tuesday 18th July      Wednesday 19th July      Thursday 20th July      Friday 21th July     




Thursday 20th July, 11:00 - 12:30 Room: N AUD4


Putting data in the driver’s seat: The role of active (meta-)data in survey data management 1

Chair Mr Knut Wenzig (DIW Berlin )
Coordinator 1Mr Daniel Bela (LIfBi Bamberg (Germany))
Coordinator 2Dr Arne Bethmann (DJI München (Germany))

Session Details

Various metadata systems for different sections of the data management lifecycle (e.g. questionnaire development, data preparation, documentation, data dissemination) are in use at institutions dealing with survey research. Some of these metadata systems make use of evolving metadata standards (such as DDI or SDMX), some others are developed independently as custom-tailored solutions. Most of them have one idea in common: Structured metadata, stored in relational databases, make it possible to have one single source of information for data on data.

With the increasing availability of metadata systems, their usage as a reference tool—e. g. for researchers looking for specific variables or questionnaire developers drawing on questions from other surveys—becomes more common. In this session we want to discuss uses of structured metadata that go beyond their passive reference function.

Since structured metadata are machine readable by definition, we are interested in exploring how and at which points in the data management lifecycle we can put metadata to use in a more active role. This may be as a means of automatically generating human readable questionnaires, automated plausibility checks during fieldwork, recoding raw survey data from the field and probably in numerous other ways. In order to implement data-driven data management processes, other sources of information come into play: for example paradata or sampling frame data can potentially be used in the same manner to enhance survey data management and gain the same benefits.

Papers presented in the session should thus focus on examples of the active use of such structured information. We would like to learn about your experiences with implementing data-driven routines as part of the data management process. The session will also provide room to discuss how much automation in the data management lifecycle is feasible and/or desirable.

Paper Details

1. The DASISH Questionnaire Design Documentation Tool – functionalities and real life examples from the tool
Mr Benjamin Beuster (NSD - Norwegian Centre for Research Data)
Mrs Hilde Orten (NSD - Norwegian Centre for Research Data)

The DASISH Questionnaire Design Documentation Tool – functionalities and real life examples from the tool

Benjamin Beuster, Hilde Orten
The Questionnaire Design Documentation Tool (QDDT) is developed with the aim of assisting large-scale survey projects in the processes related to questionnaire development and documentation of the questionnaire design process from the first conceptualization to the final questionnaire. It assists in particular the production of research concepts, questions, response domains and instruments for questionnaire modules of the European Social Survey.

Second, researchers and students can use the tool to explore metadata from existing projects, or to design new research. Interoperability with other systems and tools, most importantly the DASISH Question Variable Database and the Translation Management Tool, both currently under development, is another key aim.

The work on the QDDT started while the Data Service Infrastructure for the Social Sciences and Humanities (DASISH) project and continued under the Synergies for Europe’s Research Infrastructure in the Social Sciences (SERISS).

The conceptual model for the tool is based on a sub-set of the DDI 3.2 specification. The tool is designed to integrate and communicate with other tools using an API. It is designed to be compatible with DDI and both DDI import and export options are available. A set of modern technologies is used in the development of the tool.

This presentation of the QDDT focuses on its conceptual model, functionalities, first experience from the questionnaire module production of the European Social Survey, as well as plans for the further developments.


2. CLOSER Repository: Modernising Longitudinal Study Management
Mr Will Poynter (CLOSER, UCL)

CLOSER Discovery is a cutting edge search engine for the discovery of metadata on eight of the UK’s cohort and longitudinal studies. The longest running study that is documented in CLOSER Discovery has been running for over 70 years, which creates a formidable problem to document and manage. CLOSER Discovery demonstrates the importance of investing in rich metadata that describes many more aspects of data collection than traditional tools and methods. By documenting detailed information on the question routing, scales and images used and similar questions and variables across multiple studies, researchers, survey and data managers are all better informed. This then opens up new possibilities for the studies going forward.
In order for CLOSER Discovery to function, it sits atop a giant metadata repository, that has been designed not only to power a search engine but provide additional functionality and automation to the studies themselves. By drawing links between multiple studies, centres and data warehouses CLOSER has begun to tear down the outdated data-silo model, which has led to so many issues inhibiting harmonisation and linkage.
Data collection instrument design can be made faster and more consistent through the act of reusing entire sections of questions previously used and designed in previous studies. By documenting this process from the point of design, harmonisation can be performed more efficiently and effectively. By developing these techniques and tools using eight of the UK’s longitudinal studies, they have been rigorously tested for scalability.
CLOSER’s metadata has a universal identifier for every single item, allowing datasets used and papers published to reference the variables they have used precisely. While questionnaire designers are able to reference the questions they have used, such as standard scales, and clearly documented how they have been altered.
By having much richer and clearer metadata documented, data managers can save enormous amounts of time cleaning data that has been collected before being deposited for analysis. Also, the laborious task of creating and then maintain data dictionaries can easily be automated and standardised.
CLOSER Discovery categorises all variables and questions with topics from CLOSER’s controlled vocabulary. This allows more effective searching and filtering of the huge quantity of content made available. The task of apply topics to questions and variables is hugely time consuming, but CLOSER is working with machine learning to further automate the process, enhancing the metadata without increasing costs.