All time references are in CEST
Survey Data Harmonisation 3
| Dr Ruxandra Comanaru (European Social Survey, City University, London UK)
Ms Daniela Negoita (European Values Study, Tilburg University, The Netherlands)
|Tuesday 18 July, 16:00 - 17:00
The harmonisation of survey data has been a burgeoning research strand within social sciences over the last few years. Harmonisation has at its core the attempt to make data more comparable, allowing for data linkage and analysis of distinct datasets that were not initially meant to be assessed together. Sound methodology in data harmonisation allows comparisons and harmonisation of instruments ahead of the data collection. as well as evaluations of data from various sources that were not initially meant to be compared. As such, survey data can be harmonised ex-ante (during their design or after fieldwork before issuing the collected data) or ex-post (combining surveys not meant to be compared at the point of data collection). Some countries have made concerted efforts to harmonise instruments at the point of data collection (ex-ante), such that, for example, the wellbeing of the nation can be tracked in all national statistics surveys, pushing for the same measures to be used regularly to assess the same concepts. The latter approach to data harmonisation (i.e., ex-post) has been used to attempt to tease out insights from sources not designed to be compared a priori. The SUSTAIN 2 (WP2, Task 1) project, for example, tried to bridge data from two long-standing surveys in the European social science context: the European Values Study (EVS) and the European Social Survey (ESS). It aimed to harmonise their data over several decades in order to allow cross-survey and cross-national comparisons, and thus to link measures that have conceptual, and potentially statistical, overlap.
This session, which aims at offering practical prompts for future venues for cooperation between surveys, invites contributions on all aspects and challenges of data harmonisation: data collection mode, sampling design, translation method, and measurement instruments.
Keywords: data harmonisation, linear stretching, European Social Survey, European Values Study
Professor Zbigniew Sawiński (Institute of Philosophy and Sociology Polish Academy of Sciences) - Presenting Author
Dr Katarzyna Kopycka (Department of Sociology University of Warsaw)
Professor Anna Kiersztyn (Department of Sociology University of Warsaw)
Comparing panel surveys requires extending the usual harmonization scheme with tools that do not apply when harmonizing data from cross-sectional surveys. In the presentation, we propose a general harmonization framework that considers specific properties of panel survey data collected not in a single but through multiple panel waves. There are at least two consequences of using a multi-wave survey design. First, the data at the seams between the survey waves may not be consistent. Second, concurrent data from more than one wave may be available for some periods.
Our approach supplements the original panel survey data with the Panel Survey Data Status (PSDS) codes. The PSDS classification assigns various codes to information provided by the respondent, imputed during data editing, or resulting from the organization of the survey. The coding system also identify the survey wave from which the data comes. It allows survey users to combine data depending on how the equivalence of the variables from different surveys is understood.
The flexibility of the approach is illustrated by two examples, which compare data from the Polish Panel Survey POLPAN and the German Socio-economic Panel SOEP. The first example concerns the harmonization of the same variable, specifically employed vs. non-employed dummy, for data collected in calendar and spells format. The second example compares two types of results: when data is limited to what is established by the closest survey wave and when the overlapping results from all survey waves are included.
The PSDS codes do not require mapping to data before survey dissemination. The codes assigned to the original data can be distributed as a separate data file. They can also be prepared and shared by data users.
Mr Lennart Palm (GESIS) - Presenting Author
Dr Silke Schneider (GESIS)
Summarizing the measurement of sociodemographic characteristics, the German survey landscape could be characterized as “same but different”: studies measure the same concepts yet differ in their specific approach. Combining different data sets can thus become a laborious effort which might still produce questionable results, since key caveats can be overlooked in exhaustive survey documentation. It would be best, if such work was done once and for all to use, with standard variables ideally published within scientific use files.
To facilitate output harmonisation of socio-demographic variables, we developed proposals for standard variables for selected socio-demographic attributes. While our approach was based on the German survey data landscape, we leaned upon international standards such as ISCED for education or the EU’s standardised key social variables for household net income or main activity status, to ensure international compatibility. To provide for the quality and usefulness of the proposed standard variables, we used three methods: Firstly, before developing our proposals, we reviewed existing survey instruments in several of Germany’s leading studies (https://doi.org/10.5281/zenodo.6810973). Secondly, the proposals were discussed in a virtual roundtable meeting with researchers, study representatives and data users, as well as bilaterally with individual experts. Thirdly, based upon a multiple linear regression analysis approach, we validated our proposals both in a data-driven and a theory-driven way, using a broad set of up to 190 potential outcome variables. Based on these validation results and the feedback gained, we have refined our proposals.
In this talk, we will showcase the standard variables for 3 socio-demographic attributes, namely education, marital status and main activity status to collect final feedback. Our proposals will be published in autumn 2023.
Dr Brita Dorer (GESIS-Leibniz Institute for the Social Sciences, Mannheim) - Presenting Author
When data from different cross-cultural surveys are harmonised, that may mean harmonising data that result from different approaches to comparability. One crucial field for developing cross-cultural survey data is the development of the questionnaires in the different participating countries. For doing so, typically an English-language source questionnaire is translated into the participating language versions. It is well-known that not all multilingual surveys put the same efforts in developing these translations. This paper looks at the particular example of data harmonisation between the European Values Study (EVS) and the European Social survey (ESS) in the context of the SUSTAIN 2 project. Both surveys have been applying different translation approaches in the past: the ESS has been rigorously applying the “team approach”, in the form of TRAPD (Translation-Review-Adjudication-Pretesting-Documentation), since its first round, whereas in the EVS traditionally less strict requirements have been implemented for its questionnaire translations. This paper presents and discusses a study in which a series of items were translated according to both the ESS and the EVS translation approaches, in the language pair English-French, cross-checking in addition with the language pair English-Finnish. First, die translations resulting from both approaches are compared to each other linguistically and an attempt is made to assess the quality of both translations. Then in a second step, both translations will be fielded, and the data resulting from both translations compared. The study should help to answer the following research questions: Can data resulting from different surveys be harmonised even when the translation methods were different? Which effects do the differing translation methods of ESS and EVS have on the resulting survey data?
Dr Olga Grunwald (NIDI) - Presenting Author
The Generations and Gender Survey (GGS) is a cross-national longitudinal survey on families and life course trajectories. The first round of data collection was run in twenty countries with national teams responsible for data collection and processing. This decentralized structure enabled national teams to modify questionnaires, leading to variations in the collected data and made it time-consuming and challenging to harmonize afterwards.
Since 2020, a new round of GGS has started (GGS-II), for which the central coordination team of the GGS has centralized its operations – i.e., data collection and processing are centrally coordinated and standardized. As such, the GGS is evolving from single national questionnaires harmonized post-hoc to a centrally coordinated data infrastructure..
The same baseline questionnaire is being fielded in every country but national teams can modify it to their context (e.g., add country-specific response options). The key challenge in developing the data harmonization workflow was to balance the need for automation and standardization in data harmonization with a need for flexibility to accommodate country-specific variations in the questionnaire.
The workflow builds on the Empty Data File (EDF) – a Stata dataset with relevant information from the baseline questionnaire (e.g., variable labels, questions, response labels). This information is automatically extracted from the questionnaire code and imported into the EDF. When the raw data is merged with the EDF, country-specific variations from the baseline questionnaire are automatically flagged and then recoded manually. This process reduces the need for post-harmonization and makes data documentation more efficient.
The paper focuses on the newly developed data harmonization workflow for the GGS-II. I will discuss the challenges and solutions encountered along the way as well as plans to further automize and standardize the workflow.