ESRA 2017 Programme

Tuesday 18th July      Wednesday 19th July      Thursday 20th July      Friday 21th July     

     ESRA Conference App

Thursday 20th July, 14:00 - 15:30 Room: N AUD4

Putting data in the driver’s seat: The role of active (meta-)data in survey data management 2

Chair Mr Knut Wenzig (DIW Berlin )
Coordinator 1Mr Daniel Bela (LIfBi Bamberg (Germany))
Coordinator 2Dr Arne Bethmann (DJI München (Germany))

Session Details

Various metadata systems for different sections of the data management lifecycle (e.g. questionnaire development, data preparation, documentation, data dissemination) are in use at institutions dealing with survey research. Some of these metadata systems make use of evolving metadata standards (such as DDI or SDMX), some others are developed independently as custom-tailored solutions. Most of them have one idea in common: Structured metadata, stored in relational databases, make it possible to have one single source of information for data on data.

With the increasing availability of metadata systems, their usage as a reference tool—e. g. for researchers looking for specific variables or questionnaire developers drawing on questions from other surveys—becomes more common. In this session we want to discuss uses of structured metadata that go beyond their passive reference function.

Since structured metadata are machine readable by definition, we are interested in exploring how and at which points in the data management lifecycle we can put metadata to use in a more active role. This may be as a means of automatically generating human readable questionnaires, automated plausibility checks during fieldwork, recoding raw survey data from the field and probably in numerous other ways. In order to implement data-driven data management processes, other sources of information come into play: for example paradata or sampling frame data can potentially be used in the same manner to enhance survey data management and gain the same benefits.

Papers presented in the session should thus focus on examples of the active use of such structured information. We would like to learn about your experiences with implementing data-driven routines as part of the data management process. The session will also provide room to discuss how much automation in the data management lifecycle is feasible and/or desirable.

Paper Details

1. From passive to active – how a focus on archiving can work to improve data collection and management
Dr Steven McEachern (Australian Data Archive)
Ms Janet McDougall (Australian Data Archive)

The recent publication of the pilot data management guidelines for newly funded EU Horizon 2020 projects includes an expectation that any project producing, collecting or processing data under Horizon 2020 funding incorporate data management planning into their project. These guidelines include the expectation that the project have a data management plan, and that the data released be available under FAIR principles – that is, it is Findable, Accessible, Interoperable and Reusable.

This shift in expectations in the EU, and parallel developments in both the research and public sectors internationally (including Australia) has signficant implications for managers survey projects. Commissioning agencies are therefore likely to have increasing demands for both documentation of data collection activities and final outputs, including the data files themselves and associated reports and metadata. This then has flow on implications for the field agencies and software providers who support the survey data collection process, that will need to capture and provide sufficient metadata to enable these FAIR principles to be achieved.

The growth of such requirements need not however be seen by field agencies as entirely focussed on increasing the burden of reporting and deliverables. This paper therefore describes one current Australian effort to meet these requirements, which have significant potential benefits for the field agency in enabling improvements in internal data and information management.. The Australian Data Archive and the Social Research Centre (a private field data collection agency) have been collaborating on a project to develop procedures and practices for embedding data documentation requirements within the data collection workflow process, to enable automated capture of relevant documentation as part of the standard activities of the collection agency. The project explores the joint efforts of ADA and SRC to facilitate capture of documentation requirements for data deposit with ADA (as the research data repository, and provision of suitable metadata to support agency requirements for both access and data discovery.

The paper describes the core activities of the project, an overview of metadata management within the Social Research Centre,and key learnings that resulted from the project. Key amongst these has been the potential for information previously considered on ly as part of reporting to be utilised in new ways to improve the active data management practices of the field agency.

2. Metadata-driven Scientific Use File data management
Mr Daniel Bela (LIfBi)

With several excellent software tools emerging for the task, survey instrument creation in became more and more structured and automated in recent years. However, after field work has been done, it usually is up to one or several data managers in research institutions to process the data files and create ready-to-use analysis datasets and documentation. This data management process often is badly structured and documented, and seldom automated in social research.
Many of the procedures that have to be run in order to create usable datasets, however, contain the potential for full- or semi-automation as soon as the procedures themselves are structured appropriately. In order to deal with the vast of incoming field data from the German National Educational Panel Study (NEPS), the data management team at LIfBi (in cooperation with partners across Germany) implemented such a structured and semi-automated approach for creating and updating the Scientific Use Files for the six panel cohorts of NEPS. This happened by conceptually separating several data management tasks from each other, and creating interface steps for interchanging data extracts (e.g. for coding text answers from the surveys or generating additional variables) with external partners. Additionally, every step of the data management process that could be automated by re-using information from the survey instruments or field documentation (e.g. renaming of variables, labeling, translation), has been designed to make use of this potential for automation. This led to a large amount of additional meta-information that now directly is integrated into NEPS Scientific Use File datasets, such as full questionnaire texts.
Based on these experiences, a sketch of 'best practice' solutions to implement a metadata-driven data management workflow can be established. This presentation will focus on conceptual solutions to improve data management procedures in order to make them more structured, better documented, and less error-prone.
Eventually, this approach can lead to better survey data for analyses, and reduce unsystematic variance in data management procedures---which otherwise necessarily constitute (in the best case) a large workload of fixing data afterwards or (in the worst case) biased research results.

3. Facilitating metadata capture and reuse in the social sciences with the example of social media data
Miss Kerrin Borschewski (GESIS - Leibniz Institute for the Social Sciences)
Mr Wolfgang Zenk-Möltgen (GESIS - Leibniz Institute for the Social Sciences)

Making active use of structured metadata systems holds a wealth of opportunities to facilitate research across the entire data lifecycle. So far, only few scholars have tapped their full potential. This holds true for all kinds of research data – survey data as well as non-survey data. Non-survey data, and especially social media data, become more and more important for the work of social scientists. Insights from social media data can either be used to support and refine the findings from survey data, or as a stand-alone research material for social scientists. Indeed, the work with data from social media poses several challenges, such as the dynamic change of content, problems of attribution to actors, or complex contexts. This work, however, could be made a lot easier with elaborated data documentation. Furthermore, social media data offer a lot of potential for automating data documentation processes within different stages of the data lifecycle.
This presentation displays a use case that demonstrates how metadata of social media data could be put to active use. Therefore we will use social media data (e. g. from Twitter) as an example. We will first establish to which extent it is possible to document this kind of data in DDI, especially considering the potential of the metadata for its active use. Examples of existing datasets will be used to illustrate current practices and possible enhancements. Then we will display an idea, of how this metadata could be used in different stages of the data lifecycle. This proposal contains different steps: First, we believe it would be useful to have an automated metadata program that saves the important context information of the data while obtaining it from the social media platform. Such information would be for example the date and time the download took place, the URL the data was retrieved from, sampling or selection criteria etc. Second, it should be possible to automatically retrieve and document the topics of interest and related descriptive information from the data. An algorithm could be used to identify the important terms within social media data. The captured information should then be stored within an open and searchable metadata database. This way, researchers looking for data on a specific topic could easily find information on data sources that might be of interest to them and may even be able to compare the contents of those data sources.
The benefits retrieved from this case study are twofold. They can on the one hand support the further development of the DDI standard. Additionally, they describe processes that could facilitate the work of primary researchers and also of researchers with an intention to reuse data from social media platforms. Opening a new realm of metadata sources for all social scientists would considerably strengthen their research findings.