ESRA logo

ESRA 2023 Preliminary Glance Program

All time references are in CEST

Survey Data Harmonisation 3

Session Organisers Dr Ruxandra Comanaru (European Social Survey, City University, London UK)
Ms Daniela Negoita (European Values Study, Tilburg University, The Netherlands)
TimeTuesday 18 July, 16:00 - 17:00
Room U6-28

The harmonisation of survey data has been a burgeoning research strand within social sciences over the last few years. Harmonisation has at its core the attempt to make data more comparable, allowing for data linkage and analysis of distinct datasets that were not initially meant to be assessed together. Sound methodology in data harmonisation allows comparisons and harmonisation of instruments ahead of the data collection. as well as evaluations of data from various sources that were not initially meant to be compared. As such, survey data can be harmonised ex-ante (during their design or after fieldwork before issuing the collected data) or ex-post (combining surveys not meant to be compared at the point of data collection). Some countries have made concerted efforts to harmonise instruments at the point of data collection (ex-ante), such that, for example, the wellbeing of the nation can be tracked in all national statistics surveys, pushing for the same measures to be used regularly to assess the same concepts. The latter approach to data harmonisation (i.e., ex-post) has been used to attempt to tease out insights from sources not designed to be compared a priori. The SUSTAIN 2 (WP2, Task 1) project, for example, tried to bridge data from two long-standing surveys in the European social science context: the European Values Study (EVS) and the European Social Survey (ESS). It aimed to harmonise their data over several decades in order to allow cross-survey and cross-national comparisons, and thus to link measures that have conceptual, and potentially statistical, overlap.
This session, which aims at offering practical prompts for future venues for cooperation between surveys, invites contributions on all aspects and challenges of data harmonisation: data collection mode, sampling design, translation method, and measurement instruments.

Keywords: data harmonisation, linear stretching, European Social Survey, European Values Study

General framework for harmonizing data from panel surveys

Professor Zbigniew Sawiński (Institute of Philosophy and Sociology Polish Academy of Sciences) - Presenting Author
Dr Katarzyna Kopycka (Department of Sociology University of Warsaw)
Professor Anna Kiersztyn (Department of Sociology University of Warsaw)

Comparing panel surveys requires extending the usual harmonization scheme with tools that do not apply when harmonizing data from cross-sectional surveys. In the presentation, we propose a general harmonization framework that considers specific properties of panel survey data collected not in a single but through multiple panel waves. There are at least two consequences of using a multi-wave survey design. First, the data at the seams between the survey waves may not be consistent. Second, concurrent data from more than one wave may be available for some periods.
Our approach supplements the original panel survey data with the Panel Survey Data Status (PSDS) codes. The PSDS classification assigns various codes to information provided by the respondent, imputed during data editing, or resulting from the organization of the survey. The coding system also identify the survey wave from which the data comes. It allows survey users to combine data depending on how the equivalence of the variables from different surveys is understood.
The flexibility of the approach is illustrated by two examples, which compare data from the Polish Panel Survey POLPAN and the German Socio-economic Panel SOEP. The first example concerns the harmonization of the same variable, specifically employed vs. non-employed dummy, for data collected in calendar and spells format. The second example compares two types of results: when data is limited to what is established by the closest survey wave and when the overlapping results from all survey waves are included.
The PSDS codes do not require mapping to data before survey dissemination. The codes assigned to the original data can be distributed as a separate data file. They can also be prepared and shared by data users.

Output Harmonisation of Sociodemographic Variables: Developing Proposals for Standard Variables for German Surveys.

Mr Lennart Palm (GESIS) - Presenting Author
Dr Silke Schneider (GESIS)

Summarizing the measurement of sociodemographic characteristics, the German survey landscape could be characterized as “same but different”: studies measure the same concepts yet differ in their specific approach. Combining different data sets can thus become a laborious effort which might still produce questionable results, since key caveats can be overlooked in exhaustive survey documentation. It would be best, if such work was done once and for all to use, with standard variables ideally published within scientific use files.
To facilitate output harmonisation of socio-demographic variables, we developed proposals for standard variables for selected socio-demographic attributes. While our approach was based on the German survey data landscape, we leaned upon international standards such as ISCED for education or the EU’s standardised key social variables for household net income or main activity status, to ensure international compatibility. To provide for the quality and usefulness of the proposed standard variables, we used three methods: Firstly, before developing our proposals, we reviewed existing survey instruments in several of Germany’s leading studies ( Secondly, the proposals were discussed in a virtual roundtable meeting with researchers, study representatives and data users, as well as bilaterally with individual experts. Thirdly, based upon a multiple linear regression analysis approach, we validated our proposals both in a data-driven and a theory-driven way, using a broad set of up to 190 potential outcome variables. Based on these validation results and the feedback gained, we have refined our proposals.
In this talk, we will showcase the standard variables for 3 socio-demographic attributes, namely education, marital status and main activity status to collect final feedback. Our proposals will be published in autumn 2023.

Harmonising survey data resulting from different translation approaches: risk or enrichment?

Dr Brita Dorer (GESIS-Leibniz Institute for the Social Sciences, Mannheim) - Presenting Author

When data from different cross-cultural surveys are harmonised, that may mean harmonising data that result from different approaches to comparability. One crucial field for developing cross-cultural survey data is the development of the questionnaires in the different participating countries. For doing so, typically an English-language source questionnaire is translated into the participating language versions. It is well-known that not all multilingual surveys put the same efforts in developing these translations. This paper looks at the particular example of data harmonisation between the European Values Study (EVS) and the European Social survey (ESS) in the context of the SUSTAIN 2 project. Both surveys have been applying different translation approaches in the past: the ESS has been rigorously applying the “team approach”, in the form of TRAPD (Translation-Review-Adjudication-Pretesting-Documentation), since its first round, whereas in the EVS traditionally less strict requirements have been implemented for its questionnaire translations. This paper presents and discusses a study in which a series of items were translated according to both the ESS and the EVS translation approaches, in the language pair English-French, cross-checking in addition with the language pair English-Finnish. First, die translations resulting from both approaches are compared to each other linguistically and an attempt is made to assess the quality of both translations. Then in a second step, both translations will be fielded, and the data resulting from both translations compared. The study should help to answer the following research questions: Can data resulting from different surveys be harmonised even when the translation methods were different? Which effects do the differing translation methods of ESS and EVS have on the resulting survey data?

Biased Bivariate Correlations in Insufficiently Harmonized Survey Data

Dr Ranjit K. Singh (GESIS - Leibniz Institute for the Social Sciences) - Presenting Author

Many projects in the social sciences make it necessary to combine data from different sources. That may mean data gathered in different survey modes, different survey programs, or using different survey instruments. Often, we need to perform ex-post harmonization to improve comparability of the source data before combining it to form a homogenous integrated data product.
In this talk, I will focus on one such comparability issue and demonstrate a consequence of insufficient harmonization. Specifically, I look at the case where two instruments (or modes) lead to different item difficulties: This means if we applied the two instruments (or modes) to the same population, we would get different mean responses. If such mean differences are not mitigated before combining data, we introduce a mean bias into our composite data. Such mean bias has direct consequences for analyses based on the combined data. In data drawn from the same population, mean bias introduces error variance. In data drawn from different populations it would bias or even invert true population differences. However, in this paper I demonstrate that mean bias can also bias bivariate correlations which involve the affected variables. If differences in item difficulty are not mitigated before combining data, we introduce a variant of Simpson’s paradox into our data: The bivariate correlation in each source survey might differ substantially from the correlation in the composite dataset. In a set of simulations, I demonstrate this correlation bias effect and show how it changes depending on the mean biases in each source variable and the strength of the underlying true correlation.