ESRA logo
Tuesday 18th July      Wednesday 19th July      Thursday 20th July      Friday 21th July     




Wednesday 19th July, 11:00 - 12:30 Room: Q2 AUD1 CGD


Assessing the Quality of Survey Data 2

Chair Professor Jörg Blasius (University of Bonn )

Session Details

This session will provide a series of original investigations on data quality in both national and international contexts. The starting premise is that all survey data contain a mixture of substantive and methodologically-induced variation. Most current work focuses primarily on random measurement error, which is usually treated as normally distributed. However, there are a large number of different kinds of systematic measurement errors, or more precisely, there are many different sources of methodologically-induced variation and all of them may have a strong influence on the “substantive” solutions. To the sources of methodologically-induced variation belong response sets and response styles, misunderstandings of questions, translation and coding errors, uneven standards between the research institutes involved in the data collection (especially in cross-national research), item- and unit non-response, as well as faked interviews. We will consider data as of high quality in case the methodologically-induced variation is low, i.e. the differences in responses can be interpreted based on theoretical assumptions in the given area of research. The aim of the session is to discuss different sources of methodologically-induced variation in survey research, how to detect them and the effects they have on the substantive findings.

Paper Details

1. Loop-de-loos: Examining respondent reporting on looping questions
Dr Antje Kirchner (RTI International)
Dr Emilia Peytcheva (RTI International)
Ms Shauna Yates (RTI International)
Ms Ashley Wilson (RTI International)
Ms Lesa Caves (RTI International)
Dr Natasha Janson (RTI International)
Dr Rebecca J. Powell (RTI International)

Respondents in surveys are often asked to respond to a series of follow-up questions that are repeated based on their response to filter questions (loops). For example, obtaining details about each employer a respondent has had. To determine the number of times a respondent goes through the loop, researchers can use one of two formats: (1) ‘how many’ or (2) ‘go-again’. The ‘how many’ format asks respondents to report the number of occurrences followed by questions asking details of each occurrence. The ‘go-again’ format asks respondents to start with the first (or last) occurrence followed by more detailed questions. After answering the follow-up questions, respondents are asked if they have any other occurrences. If “yes”, they continue to iterate through the loops. Such a task can become burdensome for respondents in either format, especially as the number of occurrences increases, potentially threatening data quality.

This paper examines which loop format provides better data quality drawing on theories of motivated underreporting and research on reporting frequencies (e.g., Eckman and Kreuter 2015). We use data from the 2016/17 Baccalaureate and Beyond Longitudinal Study of college graduates (B&B, field test) where respondents were randomly assigned to one of the two loop formats. We evaluate the difference between loop formats in terms of number of reported occurrences, item nonresponse, breakoffs, and response time.

Consistent with earlier research (Eckman et al. 2014; Eckman and Kreuter 2015) preliminary results suggest that data quality differs by loop format. Specifically, the reported number of employers is significantly higher in the how many format. Item nonresponse and breakoffs on the follow-up questions are significantly lower in the go-again format, and response times are significantly longer in go-again. We discuss the implications of our findings for data quality and the potential for imputation across the different formats.


2. Assessing the impact of late respondents on data quality in the German sample of the European Social Survey (ESS)
Dr Michael Weinhardt (Bielefeld University)

In survey research, there is the regular trade-off between measurement error, representation error and cost. Fieldwork efforts such as number of contact attempts and refusal conversion measures, are often employed to achieve a more balanced sample. However, those measures are costly and may result in increased measurement error if otherwise reluctant respondents are included in the sample who deliver low data quality in their responses. This paper investigates this relationship in the first eight waves of the German sample of the European Social Survey (ESS). First, I check whether sample composition differs between early and late responders, regressing socio-demographic variables on different measures of fieldwork effort (number and type of contact attempts, type of incentives, refusal conversion efforts) to see whether more reluctant respondents differ from easy to reach and less reluctant respondents. Second, I test a wide range of data quality indicators (item non-response, response sets, interview length, length of verbatim responses) against the same measures of fieldwork efforts to check whether more reluctant respondents yield lower data quality. I especially focus on attitudinal indicators which may be related to non-response and survey cooperation (e.g. interpersonal trust, institutional trust, political interest, European integration and subjective well-being). These attitudinal variables are on the one hand of great importance to many survey researchers, on the other hand they are usually very difficult or even impossible to verify through other means of data such as administrative records. In addition, I employ three further strategies to assess the quality of attitudinal indicators. First, I compare reported voting behavior, strongly related to such variables as political interest, to actual voter turnout, to see whether greater miss-reporting occurs among late respondents. Second, I compute cronbach’s alpha as indicator of internal consistency for a range of attitudinal scales in the questionnaire. Ideally, alpha should not vary between early and late respondents. Third, I exploit a specific feature of the ESS to compute person level reliability scores. In each round of the ESS, some questions are asked twice of each respondent for testing purposes. This allows to compute reliability scores and to test the consistency of response bevahior on the respondent level between the two groups. My analyses use all eight rounds of the ESS in Germany to corroborate results and assure that findings are not simply a result of random sampling error. I also control for interviewer fixed-effects as fieldwork efforts may vary between interviewers (and sampling points, especially in difficult to reach areas). In addition, I control for the timing of the interview (date) as differences between late and early respondents in important attitudes may reflect real changes due to developments in the social surroundings. In the German sample, addresses are fielded at two different time points during the fieldwork period, allowing teasing out the difference due to late response and actual attitudinal changes in the underlying population. I discuss the results in terms of representativity, data quality and cost implications.


3. The effect of the number of calls on data quality in telephone surveys of older adults
Dr Andraž Petrovčič (University of Ljubljana, Faculty of Social Sciences)
Mr Gašper Stanovnik (GfK Slovenija)
Dr Jernej Berzelak (University of Ljubljana, Faculty of Social Sciences)

Administration of telephone surveys in social science and marketing research has become increasingly cumbersome due to the decreasing number of households with landlines, the lack of high-quality sampling frames and the negative effect of direct sales and telemarketing on respondents' response propensity. Despite extensive literature on the subject, very little is known about how these issues affect the data quality and costs of telephone surveys of older adults, where landline penetration rates are still above the general population average. Hence, this paper deals with the relationship between the data quality and the number of calls in telephone surveys of older adults. Drawing on prior literature and the continuum of resistance theory, it investigates differences in data quality between early, late, and non-respondents by comparing the unit non-response bias, the amount of item non-response and survey brake-offs. The empirical analyses were based on data collected in 2015 with a CATI survey of a nation-wide simple random sample (n = 1656) of Slovenian residents, aged 55 and above. The results show that: (1) increasing the number of calls reduced the unit non-response bias, because late respondents were significantly more similar to non-respondents than early respondents; (2) the proportion of brake-offs was significantly higher among late respondents in comparison with early respondents; (3) no significant differences between early and late respondents were found in terms of item non-response. The findings suggest that the higher number of calls in telephone surveys of older adults can have a diverse impact on different aspects of data quality that should be considered by researchers in a multilateral perspective.


4. Improving the Quality of Methodological Reports in Survey Research: Practical Guidelines and a Content Analysis of Published Reports
Dr Alexander Jedinger (GESIS – Leibniz-Institute for the Social Sciences)
Mr Oliver Watteler (GESIS – Leibniz-Institute for the Social Sciences)

In recent years, the demands on the availability of survey data have increased continuously. The scientific community and most funding organizations expect that the data are archived in an institutional repository after completion of research projects to ensure that the data are available for replication and secondary analyses. Current research, however, focuses on a narrow concept of survey data quality which involves errors induced by sampling, measurement or non-response, but almost completely ignores the complementary role of data documentation quality. Without a transparent documentation of the survey methodology, it can be difficult for researchers to assess the analytical potential and the quality of a dataset for their own research. So far, however, there has been little empirical work on the quality of methodological information in published field reports. In the current study, we fill this gap by investigating whether researchers adhere to basic requirements for the documentation of survey data. In the first part, we propose minimal disclosure requirements and develop guidelines for the documentation of survey data based on the total survey error approach. In the second part, we present the results of a content analysis that examines the quality of published methodology reports in the GESIS Data Archive for Social Sciences between 1990 and 2015.