ESRA 2019 Draft Programme at a Glance

Evaluating Survey Response Scales 3

Session Organisers Dr Morgan Earp (US Bureau of Labor Statistics)
Dr Robin Kaplan (US Bureau of Labor Statistics)
Dr Jean Fox (US Bureau of Labor Statistics)
TimeThursday 18th July, 16:00 - 17:30
Room D22

The accurate measurement of constructs in surveys depends on the use of valid and reliable item scales. Response scales often come in all shapes and sizes and can vary in their use of modifiers, such as “very” versus “extremely.” They can vary in features such as the number of response options, inclusion of numeric and/or semantic labels, scale direction, unipolar versus bipolar response options, and scale orientation. Item scales also can vary in their ability to distinguish between latent trait levels; some response options may provide more item characteristic information than others. Furthermore, with the variety of modes now available (such as web, mobile, and SMS text, as well as paper), there are additional considerations regarding how response scales can be presented (for example, single-item vs. matrix scales). With so many factors to consider, it can be difficult to know how to develop the optimal response scale for a particular construct or mode. This panel focuses on the investigation of item response scales and how they affect survey response and data quality using a variety of scale evaluation techniques including, but not limited to psychometric techniques. We invite submissions that explore all aspect of scale development and assessment, including:
(1) The impact of various question design features such as scale direction, scale length, horizontal vs. vertical scale orientation, use of modifiers, numeric labels, number of response options, etc. on survey response and data quality.
(2) The development and assessment of response scales across different data collection modes
(3) The use of psychometric and statistical measures for evaluating response scales, for example, item characteristics curves, differential item functioning, item invariance, different measures of reliability and validity, etc.
(4) Approaches for determining scale measurement invariance across different modes and devices (e.g., mobile).
(5) Comparisons of item-by-item versus matrix questions.
(6) Research showing the impact of different modifiers (for example, “a little” vs. “somewhat”).
(7) Exploration of differential item functioning and item invariance for varying item response scales
(8) Approaches for determining scale measurement invariance across different modes and devices (e.g., mobile).

Keywords: response scales, scale development, psychometrics, item response theory, confirmatory factor analysis

Grids versus Item-By-Item Designs on Item Batteries for Self-Administered Mixed-Mode, Mixed-Device Surveys

Dr Kristen Olson (University of Nebraska-Lincoln) - Presenting Author
Dr Jolene Smyth (University of Nebraska-Lincoln)
Ms Angelica Phillips (University of Nebraska-Lincoln)

With surveys increasingly being completed on mobile devices, how to ask battery questions on mobile devices is important. One open question is whether battery questions, usually containing items that constitute a scale, should be displayed in a grid or each item displayed individually (item-by-item), and whether this display should differ by mode and device. Previous research focuses primarily on web panel members, ignoring those who answer by mail in mixed-mode studies. There is surprisingly little research comparing how respondents answer grid items in web versus mail modes (Kim, et al. 2018). Within the web mode, grid formats on a smartphone sometimes yield higher nondifferentiation rates than an item-by-item design or on personal computers (Stern, et al. 2016). In other studies, grids on computers (Lugtig and Toepol 2016) or item-by-item formats displayed on a computer (Keusch and Yan 2016) yield more nondifferentiated answers. In this paper, we compare data quality across four different batteries from a general population web-push mixed-mode survey (AAPOR RR2=28.1%, n=2705). Sample members were randomly assigned to receive batteries either in a grid or as individual items. Respondents could respond by mail or by web, using either a computer or a mobile device, allowing all formats to be measured in all modes/devices. In this paper, we examine data quality across formats and modes/devices on four outcomes: item nonresponse, nondifferentiation, inter-item correlations, and scale reliability. Preliminary analyses indicate that the grid format, compared to the item-by-item format, produces less nondifferentiation on mobile devices, more on computers, and no difference on mail. This holds accounting for respondent characteristics. We explore the sensitivity of our conclusions to different measures of nondifferentiation. We conclude with recommendations for practice and future research.

The Effects of Response Format on Data Quality in Personality Tests: Matrix vs. Fill-in

Mrs Ragnhildur Lilja Asgeirsdottir (Faculty of Psychology, University of Iceland; Methodological Research Center, University of Iceland) - Presenting Author
Dr Vaka Vésteinsdóttir (Methodological Research Center, University of Iceland; Research Methods, Assessment, & iScience, Department of Psychology, University of Konstanz)
Professor Ulf-Dietrich Reips (Research Methods, Assessment, & iScience, Department of Psychology, University of Konstanz)
Dr Fanney Thorsdottir (Faculty of Psychology, University of Iceland; Methodological Research Center, University of Iceland)

Surveys are frequently presented in different ways on paper and in web surveys. While the authors of questionnaires frequently present them with a fill-in response format on paper, where respondents fill in a number representing their responses, the format of the questionnaires is often changed to a matrix or grid format in web surveys. This applies to personality tests such as the Big Five Inventory (BFI). Previous studies have indicated that matrix formats, as opposed to item-by-item formats, have been associated with e.g., higher missing data rates, higher inter-item correlation and higher levels of straightlining. However, other studies have not found this effect. Furthermore, there is currently a lack of research comparing the matrix format to the fill-in response format. The purpose of this study was to examine the effects of the response format (matrix vs. fill-in) on data quality using the BFI in a probability based panel of the general population (N=272). Data quality was examined in terms of item nonresponse, acquiescence, and reliability. The results from this study indicate that although the response format does seem to have an effect on data quality, the effects seem to be weak. The implications of the results will be discussed further.

Agree or disagree: what came first?

Ms Carmen María León (University of Castilla-La Mancha) - Presenting Author
Dr Eva Aizpurua (Trinity College Dublin)
Ms Sophie van der Valk (Trinity College Dublin)

Response order effects refer to the impact on survey responses that arise by varying the order of the response options. Previous research has documented two type of effects, known as primacy and recency effects (Krosnick & Duane, 1987). Primacy effects occur when response options presented earlier are selected more often than those presented later. Recency effects, on the contrary, occur when response options presented later are more likely to be selected. These effects have been widely studied with unordered categorical response options. However, few studied have examined response order effects with ordinal scales, despite their extensive use in survey research. We contribute to filling this gap by analysing the effects of varying the direction of fully-labeled rating scales on survey responses. To do so, a split-ballot experiment was embedded in an online survey conducted in Spain (N = 1,000). Respondents were randomly assigned to one of two groups, which received the questions in the original order (from "strongly disagree" to "strongly agree") or in the reversed one (from “strongly agree" to "strongly disagree"). The results of the study are presented and the implications and recommendations for future research are discussed.

When Don’t Know is not an Option: The Motivations behind Choosing the Midpoint in Five-Point Likert Type Scales

Dr Johan Martinsson (University of Gothenburg) - Presenting Author
Dr Elina Lindgren (University of Gothenburg)
Dr Sebastian Lundmark (University of Gothenburg)

Likert items, a common attitude measure in surveys, typically has five response categories labeled ‘strongly agree,’ ‘agree,’ ‘disagree,’ and ‘strongly disagree,’ with a midpoint labeled ‘neither agree nor disagree’ to assess an ordered (intermediary) attitude. But how do individuals who respond to Likert type questions actually interpret the midpoint value? If respondents select the midpoint for other reasons than expressing a middle position, it violates the assumption of an ordered response scale, and raise questions about the accuracy of the estimates. We investigate how respondents motivate their chose of the midpoint, and how this may vary when ‘don’t know’ is included as response option. An online survey experiment was fielded in 2018, with 6,393 members of the Swedish Citizen Panel. All participants were exposed to three 5-point attitude items, split sample with or without ‘don’t know’ as additional option. To assess the reasons for choosing the midpoint option, we asked respondents who selected the midpoint to motivate why in open-ended questions. Besides expressing a middle position, we found four general motivations for choosing the midpoint of the 5-point Likert scale; ambivalence, lack of knowledge, no opinion, and indifference. The inclusion of ‘don’t know’ as response option yield a lower number of ‘lack of knowledge’ motivations, and an increased number of individuals indicating ambivalence as the reason. However, even when ‘don’t know’ was included, a notable share of respondents still referred to lack of knowledge as reason for choosing the midpoint. The findings comply with previous research, indicating that respondents of Likert questions choose the midpoint for several reasons besides expressing a middle position. While including ‘don’t know’ as response option has been suggested as a possible solution, we find that this alone would not eradicate the problem and that more diverse and item-specific measures are likely needed to reduce ambiguity.