ESRA logo

ESRA 2019 glance program


Measurement Comparability, Reliability and Validity in Mixed Device Surveys

Session Organisers Dr Natalja Menold (GESIS)
Professor Vera Toepoel (Utrecht University)
TimeTuesday 16th July, 11:00 - 12:30
Room D21

While responding to web surveys, respondents cannot only use PC, but other devices (smartphone, tablet). Researchers are therefore faced not only with the need to develop optimal questionnaires for the mixed devices to increase measurement quality, but also with the comparability of data obtained with different devices. Whereas some researchers analyze differences in response and non-response, response rates and possible sampling bias associated with a device, the questions regarding measurement quality and measurement equivalence remain not answered. Particularly important are the main measurement quality criteria within the frame of the Total Survey Error – such as reliability and validity. In this session, we welcome papers which address measurement quality and comparability due to measurement error between devices, effects of different aspects of questionnaire design for various devices on measurement errors and measurement quality, usage of para-data when investigating measurement errors or comparability, and relationships between questionnaire design, response behavior and measurement quality or comparability in mixed device surveys.

Keywords: online surveys, measurement quality, measurement invariance

Web, App or Paper Time Use Diary – Does it Make a Difference to Measurement? Evidence from the Age 14 Survey of the Millennium Cohort Study

Dr Emily Gilbert (Centre for Longitudinal Studies, UCL) - Presenting Author
Dr Lisa Calderwood (Centre for Longitudinal Studies, UCL)

Download presentation

Time diary data provide a comprehensive and sequential account of daily life and are used for a wide range of analytic purposes. Recent years have witnessed a steady growth of large-scale time diary data collection in cross-sectional as well as longitudinal surveys, driven by the increased research interest in population activity patterns and their relationship with long-term outcomes. The majority of social surveys collect paper-administered diaries, which have been shown to produce the most accurate and reliable daily activity estimates, but present challenges relating to respondent burden and administration costs. The use of new technologies for data collection could address these weaknesses by providing less burdensome diary instruments, improving data quality, and simplifying post-fieldwork data coding costs.

The Millennium Cohort Study (MCS) was the first large-scale longitudinal survey to use a mixed-mode approach for the collection of time use data among teenagers. A smartphone app, web diary, and paper diary were specifically designed for the sixth wave of the survey, when cohort members were aged 14. The smartphone app in particular was a departure from the more traditional methods of time use data collection.

This presentation will focus on the take-up and selection into different time diary modes, data quality across the instruments, mode differences in measurement, and methodological challenges faced.


Using Paradata to Predict Test-Retest Reliability and Criterion Validity in a Mixed-Device Survey

Dr Carina Cornesse (University of Mannheim) - Presenting Author
Dr Jan Karem Höhne (University of Mannheim)
Professor Mick P. Couper (University of Michigan)
Professor Annelies G. Blom (University of Mannheim)

From past research on survey data quality, we know that paradata can predict a number of undesirable response behaviors, such as straight-lining or item nonresponse. As yet, however, we know little about whether paradata can be used to predict the reliability and validity of respondents’ answers as well. Furthermore, we know from past research that differences in the amount and type of undesirable response behavior exist across devices. As yet, however, we know little about whether these differences across devices also concern the reliability and validity of the survey data.
To fill these gaps in the literature on using paradata to predict reliability and validity in a mixed-device survey, we use data from a web survey experiment (N = 3,316) implemented in an online access panel. In this experiment, we randomly assigned survey participants to the device type (PC versus smartphone) they had to use to fill out the questionnaire. Furthermore, we implemented measures of test-retest reliability as well as criterion validity in a questionnaire module on political efficacy. From the survey respondents, we also collected an extensive set of paradata, such as response times, mouse clicks, number of page visits, and scrolling behavior.
With our experimental set-up, we can assess the association of paradata with the test-retest reliability and criterion validity of respondents’ answers. In addition, we can evaluate the effect of the device type used to fill out a questionnaire on the reliability and validity of the survey data.
Our first results suggest that some paradata, such as the number of mouse clicks, are associated with reliability while other paradata, such as response times, are not. Furthermore, reliability is higher when respondents answer the survey via PC rather than smartphone. Overall, our findings provide valuable insights into data quality in a mixed-device survey.


Scale Direction Effects and Measurement Equivalence Across PCs and Smartphones

Professor Dagmar Krebs (University of Giessen) - Presenting Author
Dr Jan Karem Höhne (University of Mannheim)

Rating scale direction effects are a well-known phenomenon in survey research. While there are several approaches to explaining how such effects occur, the literature reports mixed empirical evidence. Research also indicates that scale direction effects depend on the device – PC or smartphone – used in web survey responding. In contrast to PCs, smartphones allow respondents to take part in surveys, irrespective of the locality and situation, which might support distractions and/or multitasking. Smartphones may also have a negative effect on survey responding because of small screen sizes and intricate input capabilities. To get a more comprehensive understanding of response behavior when using PCs and smartphones, we investigated measurement equivalence of scale directions between and within devices. For this purpose, we conducted a web survey in a German access panel, using questions on achievement and job motivation (item-by-item presentation), and employed a two-step split-ballot experiment (N = 3,426) with four groups: 1) PC with a decremental scale direction, 2) PC with an incremental scale direction, 3) smartphone with a decremental scale direction, and 4) smartphone with an incremental scale direction. The initial results reveal that measurement equivalence holds for the decremental and incremental scale directions within PCs and smartphones and no systematic shift in latent means. However, between PCs and smartphones only partial measurement equivalence exists within decremental and incremental scale directions. These initial results indicate that scale direction effects are not more common in smartphones than in PCs. In addition, they indicate that there is an impact of different devices that has to be accounted for in mixed device surveys.


Reliability and Measurement Equivalence in Mixed Device Surveys with Different Format of Rating Scales

Dr Natalja Menold (GESIS) - Presenting Author
Professor Vera Toepoel ( Utrecht University)

Online surveys are becoming more and more mixed-device surveys because respondents can complete them on a regular PC, tablet or mobile phone. In addition, when programming an online survey, there are many response formats available to the web survey designer. In traditional PC-based web surveys, data were mostly gathered with rating scales made from radio buttons. With the rise of mobile-friendly (responsive) design, tiles are often used to increase the size of the clickable format. Alternatives are visual analog scales (VAS), which are frequently used in the medical sector, and slider scales, that are often used in market research. Although there are numerous studies which address comparability of data in mixed devised surveys, less is known about the effect of mixed device on measurement characteristics of data, such as reliability or measurement equivalence. Such characteristics are addressed in our experimental research. The data were collected in the GfK Online Panel, which is aimed to be representative of the Dutch population with regards to age (15+), gender and education. We used a 3*5 design in which we randomly assigned respondents to the following conditions: 1) desktop PC vs. tablet vs. mobile phone, 2) radio buttons vs. tiles vs. slider vs. VAS vs. a combination of slider/VAS. First results show that there is no effect of devise or response format on reliability. In addition, measurement invariance is less limited due to the device, but more due to response format. Implications for the further research and consequences for the practice are provided.