ESRA 2019 Draft Programme at a Glance

With or without you - standardized metadata in survey data management 2

Session Organisers Mr Knut Wenzig (German Institute for Economic Research - DIW Berlin )
Mr Daniel Bela (LIfBI – Leibniz Institute for Educational Trajectories)
Mr Arne Bethmann (Max Planck Institute for Social Law and Social Policy)
TimeThursday 18th July, 14:00 - 15:30
Room D31

With evolving data sources, such as process-generated or user-generated content, meta- and paradata play an increasingly important role in many parts of the data management lifecycle. This is also true for surveys, as they get more complex, and data management relies more on properly defined processes to ensure both data quality and maintainability. In turn, many studies, data providers and data archives have developed systems of structured metadata tailored to their specific data management needs. While some of these systems are (loosely) based on evolving metadata standards like DDI or SDMX, many are custom made solutions. For the goal of making metadata comparable and shareable across studies and institutions this is obviously a less than ideal situation.

In this session we want to discuss the issue from a practitioners view, and want to hear from people who are faced with the challenge of implementing structured metadata systems, or have done so in the past. Particularly, we want to hear about the possible benefits, problems and drawbacks when implementing metadata systems that adhere closely to metadata standards like DDI or SDMX. Possible questions to be discussed would be:

- Which processes would benefit from standardized metadata?
- Are there examples for metadata systems which cover multiple steps within the whole lifecycle?
- Are there sources for shared and reusable metadata?
- Are there tools to process standardized metadata?
- What could be incentives for sharing metadata and tools?

Keywords: metadata, ddi, sdmx

Building time series from multiple sources - using metadata for documentation and integration

Dr Steven McEachern (Australian Data Archive) - Presenting Author

The management of survey data within a specific data collection over time provides some unique challenges for the researcher in maintaining consistency and interpretability. These challenges are then multiplied when the researcher must construct their dataset from multiple sources that used multiple methods - how can the researcher find, integrate and document these multiple sources in an effective and defensible manner? The use of consistent, integrated survey metadata provides one means for enabling this process.

This paper seeks to demonstrate the application of structured metadata to integrated time series through a project currently in development at the Australian Data Archive and the Centre for Social Research and Methods at the Australian National University. The CSRM are currently working to develop a data portal for the analysis and visualisation of time series surveys across series and over time. The intent of the portal is to enable researchers and the public to study and understand movements in Australian public opinion over time, irrespective of the time series from which the specific point measure was sourced.

This process however creates significant challenges in managing and integrating the data. The measurement of an individual variable over time may be drawn from multiple series, and use variations in sampling, measurement and framing that each need to be documented and connected to the specific point measure to provide a defensible research methodology for the researcher, and clear interpretation for the secondary user. The paper describes the experiences of the ADA staff in developing the integrated datasets, including the capacity of DDI and related standards to document the source data, harmonisation process and integrated output, to provide an integrated data source that is both representative of its source material, and consistent in the quality of the resultant integrated data.

Data Consistency Checking Using Bayesian Methods to Incorporate Past Data

Mr Marcus Maher (Ipsos)
Dr Alan Roshwalb (Ipsos) - Presenting Author
Dr Robert Petrin (Ipsos)

Survey tracking programs monitoring such as performance measurement programs and public opinion polls repeat the surveys either continuously or at least regularly. The programs establish data capture methods, cleaning rules, and reporting to allow for quick turnaround in results. The quick turnaround requirements and the repetitive rhythm of the data collection is prime for small errors to creep into the data capture process. Automation helps in the survey collection and data capture but there are always possibilities for errors to occur. Many studies rely identifying changes in data streams using rule such as a change in score of more than the margin of error or a change of a specified amount will instigate a review procedure. These methods are inexact in helping identify possible data collection or data capture errors, or identifying possible changes in data trends. This paper examines using Bayesian testing in a quality control construct to identify unexpected changes in data trends and set them aside for deeper review. The approach incorporates past data using empirical Bayes methods in prior distributions to be used in Bayes Factor analyses. These analyses should have greater sensitivities to changes in the data stream due to data collection and data capture errors or real change in the trend. Any credible changes in data distributions are flagged for further review. This paper examines the data from tracking performance studies and polling.

Documenting Georeferenced Social Science Survey Data. Limits and Possible Solutions

Miss Kerrin Borschewski (GESIS - Leibniz Institute for the Social Sciences) - Presenting Author
Mr Stefan Müller (GESIS - Leibniz Institute for the Social Sciences)
Mr Wolfgang Zenk-Moeltgen (GESIS - Leibniz Institute for the Social Sciences)

The use of areal information about respondents’ neighborhoods provides great benefits for social science research. By using georeferenced survey data, for example, researchers can answer questions about individual social behavior or attitudes while also taking into account the detailed spatial patterns of social processes. As with all research data, to make such data understandable, shareable and re-usable, the use of well-established metadata standards is imperative. Both for the survey data and the geographic data such metadata standards exist: The Data Documentation Initiative standard (DDI) of the social sciences and the ISO 19115 standard of the geosciences. Challenges, however, generally arise when researchers aim to document data which originate at the interface of different scientific disciplines, as in the case of georeferenced survey data. These data imply a need to document data from different sources and of different types, and hence of different contents and different structures. The aforementioned metadata standards were not designed to document linked data collections in all use cases. As such, to guarantee thoroughly documented and interoperable metadata, data librarians with interdisciplinary expertise need to get involved in such research projects in an early stage.
In this presentation, we showcase a use-case of social science survey data that are spatially linked to geospatial data attributes. We present the challenges and analyze to which extent the social sciences metadata standard DDI-Lifecycle, which contains elements compatible to ISO 19115, is capable of documenting said data. In response to the challenges of documentation, we display different approaches for a solution. The information retrieved from this case study can help to assist the producers of metadata standards by displaying the needs of special use-cases and to support metadata initiatives, e.g. by delivering content related input on the need for special metadata elements.

Data visualization for comparative social science survey data and metadata at GESIS

Miss Julia Hermann (GESIS) - Presenting Author
Mr Wolfgang Zenk-Moeltgen (GESIS)

At GESIS, numerous data from national and international comparative studies are prepared, docu-mented, and archived. This data are currently offered on different GESIS-portals for reuse. Users receive metadata about these study collections (such as overviews of trends, scales, thematic cate-gories, survey years and participating countries) via the value-added products, the data catalog and the homepages of the study collections - mostly in the form of tables or long lists. This ensures the completeness of the data, but it is at the expense of clarity.
The already enormous amount of data that is constantly growing makes it increasingly difficult for users to get an overview, to quickly select the needed information, and to decide on dataset selec-tion. For this reason we establish a tool with which metadata and survey data can be displayed by using various graphics. The advantage of data visualization is to make certain concise relationships from the data understandable at a glance, to summarize information, and to reduce the complexity of data understanding. Graphics should have a main message in order to draw the users' attention to certain datasets, developments, and contexts from a study collection. In addition to standard graphics, country maps will also be created. Moreover, the use of interactive and animated graphics is planned.
As part of the project, different solutions and approaches are currently being developed to make the data visible and as understandable to users as possible. Main benefits from the projects are that different software solutions are compared, including customized individually programmed solutions, and that the internal workflow of creating and providing visualizations is being considered. This enables us to come up with a practical support for data archive staff in creating more overview for secondary users of the data.

ExploreData search portal for high quality data and metadata – how do we benefit from DDI and standardized metadata?

Miss Julia Hermann (GESIS) - Presenting Author
Mr Wolfgang Zenk-Moeltgen (GESIS)
Dr Christina Eder (GESIS)

GESIS is currently developing an innovative online search portal for high quality data and metadata in which the complex metadata of different large-scale survey programs are offered in a systematic and user-friendly way. Users can quickly get an overview of survey programs and their different components, e.g. temporal, geographic, thematic units, or populations. Users can search for studies, documents, single variables, concepts, topics, trends, and other metadata using the free text search.
Additionally, they can browse and use those components to filter search results.
To enable these functions, we use the services of DDI: In order to be able to filter the studies according to content-related topics, we use the CESSDA Topic Classification. All studies of the GESIS archive are classified according to this scheme. Thus, all studies on selected topics will be found. Furthermore, we use the DDI Controlled Vocabularies for the methodology filter. Study collections map their methodological metadata to the new CVs. As a result, studies from different study collections can be found simultaneously if the user filters them according to methodological aspects.
Moreover, the search portal contains also functions on the variable level: users receive all information on variable (such as question wording, variable distribution and statistical values), they can compare variables from different study collections and compile customized data sets for large data cumulations. We can offer these functions because studies from national and international study collections are deeply prepared down to the variable level according to international DDI standards.
The presentation will focus on the unique functions of browsing, searching, filtering, downloading, analyzing, and comparing variables and studies across different study collections from the GESIS Archive and how the DDI standard enabled the creation of these functionalities.