ESRA logo

ESRA 2021 full program

Friday 2 July Friday 9 July Friday 16 July Friday 23 July

Short courses on Thursdays



Various approaches to reducing measurment error

Session Organiser Dr Bella Struminskaya (Utrecht University)
TimeFriday 23 July, 13:15 - 14:45

1. An Eye-tracking Study of Scale Direction Effect Dr Ting Yan
2. Coverage rate and coverage bias in web surveys in Europe Dr Alessandra Gaia
3. Data quality in surveying adolescents: Experiences from a survey with 14 to 16 years old pupils in lower secondary schools in Vienna Dr Susanne Vogl
4. Do Survey Design Features Influence Interviewer Effects and Deviant Behavior? Mr Lukas Olbrich
5. The WHO-5 well-being index – Validation based on item response theory and the analysis of measurement invariance across 35 countries dr. Philipp Sischka

Evaluating Machine Learning Algorithms to Detect Interviewer Falsification

Ms Silvia Schwanhäuser (Institute for Employment Research (IAB)) - Presenting Author
Ms Yuliya Kosyakova (Institute for Employment Research (IAB), University of Bamberg, and University of Mannheim)
Ms Natalja Menold (University of Dresden (TU-Dresden))
Mr Joseph Sakshaug (Institute for Employment Research (IAB), University of Munich (LMU), and University of Mannheim)
Mr Peter Winker (University of Giessen)

Download presentation

Interviewers play a vital role for the quality of survey data, as they directly influence response rates and are responsible for appropriately administering the questionnaire. At the same time, interviewers may be enticed to intentionally deviate from the prescribed interviewing guidelines or even fabricate entire interviews. Different studies have discussed various possibilities to prevent and detect such fraudulent interviewer behavior. However, the proposed controlling procedures are often time consuming and their implementation is cumbersome and costly.

One understudied possibility to simplify and automate the controlling process is to use supervised machine learning algorithms. Even though some studies propose the use of unsupervised algorithms like cluster analysis or principal component analysis, there is hardly any literature on otherwise widespread methods like neural networks, support vector machines, decision trees, or naïve Bayes. This is mainly driven by the lack of appropriate test and training data, including sufficient numbers of falsifiers and falsified interviews to evaluate the respective algorithms.

Using data from a German experimental study, including an equal share of falsified and real interviewers as well as real-world data from a German panel survey with fraudulent interviews in different waves, we address the question: How well do supervised machine learning algorithms discriminate between real and falsified data? To do this, we evaluate the performance of different algorithms under various scenarios. By utilizing different data sources and working with different subsets for training and testing the algorithms within and across datasets, we provide additional evidence regarding the external validity of the results. In addition, the setting allows us to draw conclusions on the different strategies and behaviors of falsifying interviewers.


Data quality in surveying adolescents: Experiences from a survey with 14 to 16 years old pupils in lower secondary schools in Vienna

Dr Susanne Vogl (University of Vienna) - Presenting Author
Mr Franz Astleithner (University of Vienna)
Ms Raphaela Kogler (University of Vienna)

Collecting data in an online-panel survey with adolescents holds many challenges: Two of these, representation and measurement error are the focus of this presentation. The database is an online survey conducted with 14 to 16 year-olds in Vienna in 2018. We will reflect on the experiences with recruitment and outcome rates when schools and school authorities are involved and guardians’, as well as adolescents’ consent, is required. Due to the multiple actors in the sampling process, sample biases are inevitable.
Furthermore, with low educational attainment and more than half of the respondents having German as their second language, measurement quality could be in danger. Thus, we paid special attention to questionnaire design and pretesting. Additionally, to keep up motivation and attention, we introduced a split-ballot experiment with video-clips between thematic blocks, forced-choice, and delayed display of the submit button. Effect of these treatment conditions on duration, break-offs, item-nonresponse and response patterns are analysed.
The aim of the contribution is, to critically review our experiences, discuss practical requirements and problems in surveying disadvantaged adolescents, and conclude future research in surveying adolescents.


Do Survey Design Features Influence Interviewer Effects and Deviant Behavior?

Mr Lukas Olbrich (Institute for Employment Research (IAB)) - Presenting Author
Dr Yuliya Kosyakova (Institute for Employment Research (IAB))
Professor Joseph W. Sakshaug (Institute for Employment Research (IAB))

Interviewers represent a well-known source of error in face-to-face surveys. Previous literature has found considerable interviewer effects on survey estimates and multiple studies have provided evidence on single interviewers who intentionally deviate from fieldwork instructions. However, it is still unclear to which extent specific survey design features facilitate interviewer effects and deviant interviewer behavior. In this study, we use data from eight rounds of the European Social Survey where countries differ in survey design features to explore this question. Specifically, we focus on differences with regard to sample selection methods (individual-level register, other register, random route) and the instrument mode (CAPI, PAPI). Based on principal-agent theory, we develop a theoretical framework for interviewer behavior where the field agency’s ability to monitor interviewers and the interviewers’ variety of tasks play a crucial role in enhancing interviewer effects and deviant behavior. To test the hypotheses on the design features derived from this framework, we use a two-step approach. First, we employ multiple indicators of data quality as dependent variables of multilevel location-scale models estimated for each country in each round separately. These models allow for constructing four distinct measures of interviewer behavior for each estimation and for 199 country-round observations. In the second step, we use this sample to estimate to which extent sample selection methods and the instrument mode are associated with the interviewer measures. The results will show under which circumstances both survey practitioners and researchers must pay particular attention to interviewer behavior.


An Eye-tracking Study of Scale Direction Effect

Dr Ting Yan (Westat) - Presenting Author

Download presentation

Scale direction effect refers to the impact of the direction in which a scale is presented to respondents on the resultant answers. A response scale can start from a positive to a negative end. It can also begin with a negative end progressing to the positive end. Holding other scale features constant, scale direction is found to affect response distributions by yielding more selections of scale points closer to the beginning of the scale. Although this phenomenon has been empirically demonstrated in different modes of data collection for respondents with different demographic characteristics, what remains understudied is the mechanism underlying the scale direction effect. In addition, it is not clear which scale direction is easier for respondents to use. To address these questions, I employed eye-tracking because it provides a direct widow into how respondents process survey questions. In an eye-tracking study, respondents’ eye movement is tracked while they are reading and answering survey questions. In this paper, I will examine two eye-tracking measures. Fixation counts and fixation duration are used to study respondents’ attention to and cognitive processing of response scales. Pupil dilations are used to understand the extent of cognitive difficulty of using response scales. I will compare fixation counts, fixation duration, and pupil dilations by scale direction and by respondents’ actual answers. Findings of this paper will shed light on how respondents process response scales and will have practical implications for questionnaire design.


The WHO-5 well-being index – Validation based on item response theory and the analysis of measurement invariance across 35 countries

Dr Philipp Sischka (University of Luxembourg) - Presenting Author

Background: The five-item World Health Organization Well-Being Index (WHO-5) is a frequently used brief stan- dard measure in large-scale cross-cultural clinical studies. Despite its frequent use, some psychometric questions remain that concern the choice of an adequate item response theory (IRT) model, the evaluation of reliability at important cutoffpoints, and most importantly the assessment of measurement invariance across countries. Methods: Data from the 6 th European Working Condition survey (2015) were used that collected nationally representative samples of employed and self-employed individuals ( N = 43,469) via computer-aided personal interviews across 35 European countries. An in-depth IRT analysis was conducted for each country, testing dif- ferent IRT assumptions (e.g., unidimensionality), comparing different IRT-models, and calculating reliabilities. Furthermore, measurement invariance analysis was conducted with the recently proposed alignment procedure. Results: The graded response model fitted the data best for all countries. Furthermore, IRT assumptions were mostly fulfilled. The WHO-5 showed overall and at critical points high reliability. Measurement invariance anal- ysis revealed metric invariance but discarded scalar invariance across countries. Analysis of the test characteristic curves of the aligned graded response model indicated low levels of differential test functioning at medium levels of the WHO-5, but differential test functioning increased at more extreme levels. Limitations: The current study has no external criterion (e.g., structured clinical interviews) to assess sensitivity and specificity of the WHO-5 as a depression screening-tool. Conclusions: The WHO-5 is a psychometrically sound measure. However, large-scale cross-cultural studies should employ a latent variable modeling approach that accounts for non-invariant parameters across countries (e.g., alignment).


Coverage rate and coverage bias in web surveys in Europe

Dr Alessandra Gaia (University of Milano-Bicocca) - Presenting Author
Dr Chiara Respi (University of Milano-Bicocca)
Professor Emanuela Sala (University of Milano-Bicocca)

Drawing on the total survey error (TSE) framework, this paper analyses coverage bias in web surveys. Coverage error arises in Internet surveys that exclude the population that lacks the equipment and the skills to access the Internet but nonetheless aim to generalise results to the general population. If the internet population and the non-internet population differ in variables of interest for research purposes, then coverage bias arises. While, in Europe, Internet penetration has increased over time and, especially in areas where it is still low, it may increase further over the next years, researchers have argued that, over time, the non-Internet population might become more and more different from the rest of the population, and this phenomena might increase coverage bias. Indeed, both the literature on the “digital divide” as well as the survey methodology literature has shown marked socio-demographic differences in Internet use. Applying multilevel analysis to Eurobarometer data (2010-2018) we replicate an empirical study (Mohorko, de Leeuw, & Hox, 2013) to analyse coverage rate and bias in web surveys in Europe, and monitor its evolution over time. Specifically, we answer the following research questions: what is the Internet coverage rate? Are there socio-economic and demographic differences between individuals having access to the internet and individuals without internet access? Does coverage error in web surveys lead to coverage bias in key social indicators such as political participation, health, and organisation of leisure time? Are there differences across European Countries in the coverage rate and coverage bias in key social indicators? The implication of research findings on survey practice will be discussed.

Mohorko, A., de Leeuw, E., & Hox, J. (2013). Internet coverage and coverage bias in Europe: Developments across countries and over time. Journal of official statistics, 29(4), 609-622.