ESRA logo

ESRA 2019 glance program


Evaluating Survey Response Scales 1

Session Organisers Dr Morgan Earp (US Bureau of Labor Statistics)
Dr Robin Kaplan (US Bureau of Labor Statistics)
Dr Jean Fox (US Bureau of Labor Statistics)
TimeThursday 18th July, 14:00 - 15:30
Room D22

The accurate measurement of constructs in surveys depends on the use of valid and reliable item scales. Response scales often come in all shapes and sizes and can vary in their use of modifiers, such as “very” versus “extremely.” They can vary in features such as the number of response options, inclusion of numeric and/or semantic labels, scale direction, unipolar versus bipolar response options, and scale orientation. Item scales also can vary in their ability to distinguish between latent trait levels; some response options may provide more item characteristic information than others. Furthermore, with the variety of modes now available (such as web, mobile, and SMS text, as well as paper), there are additional considerations regarding how response scales can be presented (for example, single-item vs. matrix scales). With so many factors to consider, it can be difficult to know how to develop the optimal response scale for a particular construct or mode. This panel focuses on the investigation of item response scales and how they affect survey response and data quality using a variety of scale evaluation techniques including, but not limited to psychometric techniques. We invite submissions that explore all aspect of scale development and assessment, including:
(1) The impact of various question design features such as scale direction, scale length, horizontal vs. vertical scale orientation, use of modifiers, numeric labels, number of response options, etc. on survey response and data quality.
(2) The development and assessment of response scales across different data collection modes
(3) The use of psychometric and statistical measures for evaluating response scales, for example, item characteristics curves, differential item functioning, item invariance, different measures of reliability and validity, etc.
(4) Approaches for determining scale measurement invariance across different modes and devices (e.g., mobile).
(5) Comparisons of item-by-item versus matrix questions.
(6) Research showing the impact of different modifiers (for example, “a little” vs. “somewhat”).
(7) Exploration of differential item functioning and item invariance for varying item response scales
(8) Approaches for determining scale measurement invariance across different modes and devices (e.g., mobile).

Keywords: response scales, scale development, psychometrics, item response theory, confirmatory factor analysis

All Thumbs?: Designing Thumb-Friendly Scales for Online Surveys

Dr Frances Barlas (Ipsos Public Affairs) - Presenting Author
Mr Randall Thomas (Ipsos Public Affairs)

As survey designers we need creative and respondent-friendly solutions that facilitate online survey completion on any device respondents choose, including in a thumb-friendly smartphone environment. Given the growing use of emojis in smartphone communication, we are investigating their effectiveness as response scales in online surveys for all devices. At the 2018 AAPOR conference, we presented work from an initial study that showed emojis could be a good option for making surveys smartphone friendly, we discovered that there were limits to the types of scales that lend themselves to emojis. For that study, we assessed the effectiveness of smiley faces and thumbs up/down emojis for both unipolar and bipolar scales. While the scales showed comparable results to semantically-labeled scales, with the thumbs up/down scale, the neutral point of the bipolar scale and the absence of the concept in the unipolar scale was a sideways thumb which seemed to confuse some study participants. We conducted additional research to consider alternative designs in an effort to improve the experience for respondents while looking to achieve comparable if not superior results to the semantically-labeled variant of the scale. The testing included alternative designs for the sideways thumb that would function well in both unipolar and bipolar scales. We found that presenting a fist with a flattened thumb worked just as well as the version of the scale with the thumbs-sideways and seemed more intuitive for respondents. While the emoji scales took about the same amount of time for respondents to answer, respondents reported somewhat more enjoyment using the emoji scales. Overall, presenting emoji responses options without corresponding semantic labels yielded reliable results, and took about the same length of time to complete as semantic scales. We discuss additional improvements to emoji scales that can potential make them more efficient for future research.


How to Measure Happiness? Assessing the Measurement Quality of the Single Item Happiness Scale and Different Multi-Item Scales by Using the Multitrait-Multimethod Design

Professor Axel Franzen (University of Bern) - Presenting Author
Mr Sebastian Mader (University of Bern)

Interest in the research of life satisfaction has increased in recent years and gained in popularity. This increase in importance is due to the insight that the well-being of a society cannot only measured by GDP per capita but depends also on the life satisfaction of its citizens. Hence, in the social science a new research field called „Happiness Economics“ emerged (Frey and Stutzer 2002, Easterlin 2002). The topic gained also importance in politics. Former British Prime minister David Cameron founded the initiative «Measuring National Well-Being» and the United Nations conducts the World Happiness Report annually since 2012.
A crucial question of happiness research is how it can be measured. International surveys like the World Happiness Report rely on a single item measurement: “All in all how happy are you with your life?” which is then followed by an 11-point answering scale. In Psychology research uses multi-item scales like the „Satisfaction with Life Scale“ of Diener et al. (1985). Therefore, a crucial question is the comparison of the measurement quality of single item scales with the measurement of multi-item scales.
This paper investigates the question of how the single item measurement performs in comparison to different multi-item measurements, particularly with respect to the Satisfaction With Life Scale of Diener (1985) and the Positive and Negative Affect Scale of Watson, Clark und Tellegen (1988). We employ the well-known Multitrait-Multimethod design first suggested by Campbell und Fiske (1959) and assess the test-retest reliability and the construct validity. Our experimental study consists of four groups into which 404 participants were randomized. Participants of each group were surveyed twice within four weeks, eighter online-online (1), online-personal (2), personal-online (3), or personal-personal (4).


Smileys, Stars and Text Labels in Mobile Contextual User Surveys: A Cross-Cultural Investigation

Dr Yongwei Yang (Google, Inc.) - Presenting Author
Mr Aaron Sedley (Google, Inc.)

While significant research exists on cross-national, cross-language and cross-cultural generalizability of survey research (Harkness, Braun, Edwards, Johnson, Lyberg, Mohler, Pennell, & Smith, 2010), this literature does not sufficiently address cross-language interpretation and response to specific visual and text scale variants for measuring satisfaction via contextual user experience surveys.
With the propagation of mobile apps products, the business and research communities are keen to understand user attitudes and experiences in the context of actual product usage. Contextual user experience (UX) surveys are designed for this purpose. These surveys are embedded in a website or mobile app and triggered during or after a user-product interaction. Designs of such surveys need to balance mobile UX considerations with survey data quality.
Visual stimuli are often used to label scale points in contextual UX surveys. Typical stimuli include various smiley faces, stars, and thumbs (up or down). Sometimes verbal labels are also used to anchor scale endpoints. Using US samples and with a 5-point satisfaction scale, our previous study (Sedley, Yang, & Hutchinson, presented at APPOR 2018) compared data quality produced by star vs. smiley stimuli, with and without endpoint verbal anchors. We found that smiley with end-point labeling led to better response quality and criterion validity.
The current study replicates and extends our 2018 experiment to a number of distinct cultural and language settings -- US (English), Germany (German), Spain (Spanish), Russia (Russian), Turkey (Turkish), UAE (Arabic), India (Hindi), Japan (Japanese), Taiwan (Traditional Chinese). As in the 2018 study, we compare the four design variants (star vs. smiley crossed by with vs. without endpoint labeling) using dismissal-to-response ratio, response time, response distributions, and (where applicable) criterion-related validity.


Understanding Smiley Scales in Cross-Cultural Contexts

Mr Aaron Sedley (Google, Inc.) - Presenting Author
Dr Yongwei Yang (Google, Inc.)

Contextual user experience (UX) surveys are surveys embedded in a website or mobile app and triggered by user-product interactions. They are used to measure user experience and attitude in the context of product usage. In these surveys, smiley faces (with or without verbal labels) are often used as answer scales for questions measuring constructs such as satisfaction. Our studies done in the US in 2016 and 2017 found that carefully designed smiley faces may distribute evenly along a numerical scale (0-100) and endpoint labeling may improve scaling properties (Sedley, Yang, & Hutchinson, presented at APPOR 2017).
With the propagation of mobile apps products around the world, the survey research community needs to test the generalizability of mono-population findings to cross-national, cross-language and cross-cultural contexts. The current study builds on our previous scaling studies as well as work by cross-cultural survey methodologists that explored verbal scales meanings (e.g., Smith, Mohler, Harkness, & Onodera, 2005). We investigate the scaling properties of smiley faces in a number of distinct cultural-lingual settings: US (English), Germany (German), Spain (Spanish), Russia (Russian), Turkey (Turkish), UAE (Arabic), India (Hindi), Japan (Japanese), Taiwan (Traditional Chinese). Respondents in the study will complete the surveys using smartphones.
Specifically, we assess scaling properties of various smiley designs by measuring smiley faces on a 0-100 scale, to calculate semantic distance between smileys. This is done by both presenting each smiley independently and in-context with other smileys (i.e., as a multi-point smiley scale). With the scale format, we also evaluate the effect of endpoint labeling. Where applicable we employ multi-question scales to allow for multivariate and latent variable models to compare the functioning of smiley scales. Finally, findings are supplemented by respondents’ own interpretations of the smiley face variants via open-ended responses.


Asking about Ideology: Experiments in Western Europe

Mr Jonathan Evans (Pew Research Center) - Presenting Author
Ms Martha McRoy (Pew Research Center)
Mr Scott Gardner (Pew Research Center)
Ms Stacy Pancratz (Pew Research Center)
Dr Neha Sahgal (Pew Research Center)
Ms Ariana Salazar (Pew Research Center)
Ms Kelsey Starr (Pew Research Center)
Dr Patrick Moynihan (Pew Research Center)

The increasingly polarized political landscape in Europe has renewed interest in the quality of survey measures of ideology. Of course, determining valid and reliable indicators of multidimensional concepts is always challenging—even more so within a multinational context given the cultural variety involved. Using nationally representative data from Pew Research Center telephone surveys conducted in 2017 and 2018 across France, Germany, Spain and the UK, we use two question-wording experiments to illustrate the sensitivity of a standard "left-right" political ideology measure to scale label modifications. The standard question for both experiments uses a seven-point scale with only endpoints labeled: "0 indicating extreme left" and "6 indicating extreme right."

In the first experiment, we use a split-ballot format and modify the standard question with different numeric endpoint labels—that is, moving from a 0-6 scale to 1-7 to avoid any negative association of "extreme left" with "0" (which could be interpreted as irrelevance).

The second experiment asks, in addition to the standard question, all respondents to describe themselves using a five-point, fully labeled scale—omitting numeric values altogether—as left, leaning left, center, leaning right or right. A fully labeled scale (without numeric values) is suggested by the literature to be less cognitively burdensome to respondents, yielding more valid and reliable data.

For each experiment, we compare the standard question to the alternatives in terms of the distribution of political ideology (including item nonresponse), demographic profiles and party affiliations of "left" versus "right," and the correlation between ideology and other attitudinal and behavioral variables. For the second experiment, as both ideology questions are posed to all respondents, we analyze those with incongruencies between ideology measures in terms of demographics and attitudes. We conclude with recommendations on the value of specific elements of political ideology questions for each country surveyed.