ESRA logo
Tuesday 18th July      Wednesday 19th July      Thursday 20th July      Friday 21th July     




Tuesday 18th July, 16:00 - 17:30 Room: Q2 AUD1 CGD


Using paradata to assess and improve survey data quality 4

Chair Dr Caroline Vandenplas (KULeuven )
Coordinator 1Professor Geert Loosveldt (KULeuven)
Coordinator 2Dr Koen Beullens (KULeuven)

Session Details

Survey methodologists are currently facing challenges of declining response rates, increasing risk of nonresponse bias and measurement error, as well as escalating costs of survey data collection. An approach, with limited costs, to tackle these challenges is the use of paradata. Paradata, data about the survey process, have always been present but the range and detail level of them have considerably increased with the computerization of the data collection process. Such data can be used to detect and eventually reduce systematic survey errors and increase data quality, during the fieldwork (adaptive designs) or in post-survey adjustment. Paradata can also be used to reduce the cost of the survey process as it is done to determine caps on the number of phone call attempts in telephone surveys.
We are interested in papers that apply the use of paradata to detect and improve data quality or/and reduce survey costs. For instance, time and timing are both linked to the survey costs and the data quality, two essential elements of a survey. The timing of the visits, calls or sent-out of questionnaire/request and reminders has been shown to be determining for survey participation. At the same time, requesting that interviewers work in the evening or at the weekend or making sure that the reminders to a Web or mail surveys are sent timely may have cost implications. Nonresponse error is not the only type of survey error to be linked to time: the time taken to answer a question, also called response latency, is known to echo the cognitive effort of the respondent and, hence, data quality. On the other hand, the interviewer speed can also influence data quality. Moreover the interviewer speed has been shown to be dependent of the rank of the interview.

The aim of this session is to reflect on possible links between paradata reflecting ‘easy’ measured characteristic of different steps of the survey process and data quality. Such a link could then help data collection manager and researcher to detect potential systematic survey errors in a fieldwork monitoring or post-evaluation context and lead to opportunities to prevent or correct for these errors. We invite papers demonstrating a link between paradata and data quality as well as papers showing how this link can be used to increase data quality or reduce cost.

Paper Details

1. Using GPS Data to Assess Errors in Paradata in Face-to-Face Surveys
Dr James Wagner (University of Michigan)
Dr Kristen Olson (University of Nebraska - Lincoln)
Ms Minako Edgar (University of Michigan)

Level-of-effort data are paradata generated by the process of collecting data in surveys. Level-of-effort data are developed from call record data. In field surveys, these data have a number of uses, including monitoring, being used as the basis of decision rules in responsive or adaptive designs, and in post-survey adjustments. However, if these data include measurement errors, then this is likely to reduce their effectiveness across these multiple purposes. These data are recorded by interviewers working in a wide variety of settings who are asked to implement a complicated set of tasks. Like the administration of questionnaires, these data may be incorrectly coded or even missed altogether. Very little research into the quality of paradata, and especially level-of-effort paradata, has been conducted. We harness a new data source to aid in the evaluation of these paradata. We use GPS data generated by smartphones carried by interviewers in the National Survey of Family Growth from 2011 to 2013. We compare these GPS data with the interviewer-reported call records to identify potential errors in the level-of-effort paradata.


2. The Accuracy of Using Paradata to Detect Interviewer Question-Reading Deviations and Assess Data Quality
Ms Jennifer Kelley (University of Essex)

Paradata is now widely used throughout the survey life-cycle, from informing design to making post-survey adjustments. Paradata has the promise of reducing cost while improving data quality. Hence, many survey organizations are using paradata to monitor interviewers and aid quality control efforts. One promising way of using paradata is to use timestamps to determine when interviewers violate expected times to administer questions. Violations (e.g., either too short or too long) are viewed as an indicator of poor data quality and flagged for further investigation. However, not much is known about the level of accuracy in using timestamps as a method to detect question-reading deviations or as a method to measure data quality. Further, there are no clear guidelines on what timestamp detection method should be used. Should one construct a threshold using the number of words in the question to flag suspect questions or should one use a certain standard deviation of the mean reading-time as a threshold to flag suspect questions? Moreover, should timestamps be used in conjunction with other paradata or can timestamps alone be used to accurately detect question-reading deviations?

This study attempts to answer the above questions by using several data sources from Wave 3 of the Understanding Society Innovation Panel, including paradata (timestamps), interview recordings, question characteristics, respondent and interviewer characteristics, and survey data. In addition, behavior coding was conducted on a subset of the interview recordings; the recordings were stratified by interviewer (n=81) and two interviews were randomly selected, resulting in a sample of 10,949 questions. Interviewers’ initial reading of the questions were coded on whether or not they read the question verbatim. Further, deviations were coded to flag questions where words were omitted and/or substituted or the question in its entirety was skipped and if said deviations changed the meaning of the question. When comparing the behavior coding to using timestamp thresholds, preliminary results show that timestamp thresholds may be a moderately accurate method to detect when interviewers are not reading the questions verbatim (70.2% accuracy). However, using timestamp thresholds to detect the most egregious question-reading deviations, deviations resulting in that changing the meaning of the questions, the accuracy of timestamp thresholds increases to almost 90%. Further analysis will investigate: 1) if certain types of deviations (e.g., skipping questions, substituting words, etc.) are more accurately captured than others and 2) if alternative detection methods (e.g., using standard deviation thresholds) are more accurate than timestamp thresholds.


3. Don't Know’ Answers – An International Comparative Analysis Using Interviewer Data
Dr Kingsley Purdam (University of Manchester)
Dr Joe Sakshaug (University of Manchester)
Dr Mollie Bourne (University of Oxford)
Dr David Bayliss (University of Manchester)

‘Don't Know’ responses to survey questions are of both methodological and substantive interest. We analysed interviewer observation data - paradata - about how respondents answered questions during their interview for the European Social Survey. We found that respondents answer with a ‘Don't Know’ response across factual, value and attitudinal questions. In general women, younger people and those with none or lower educational qualifications are more likely to give ‘Don't Know’ responses. Whilst most respondents are perceived to understand the questions and try to answer to the best of their ability many ask for clarification. We identified variations in the likelihood of a ‘Don't Know’ response at both the interviewer and country level. It is too simple to view all ‘Don't Know’ responses as non-attitudes, or a result of a lack of knowledge. Uncertainty may reflect critical engagement with a complex issue and analyses of public attitudes need to take account of this. Methodological and survey design innovations could include additional support for those people who can find answering certain types of survey questions a challenge and increased training for interviewers to ensure consistency during the interview process. However, for many people a ‘Don't Know’ response may be the only valid answer they can give. If this is not accurately captured in surveys then the validity of the estimates of public attitudes can be questioned.


4. Investigating the invariance between modes using indicators derived from process-related paradata
Dr Ulf Kroehne (German Institute for International Educational Research (DIPF), Frankfurt am Main, Germany)
Professor Frank Goldhammer (German Institute for International Educational Research (DIPF), Frankfurt am Main, Germany)

Paradata promise potential for improving survey data quality in multiple ways, reflecting both the breadth of the concept paradata and the need for a theoretical foundation to frame the various empirical applications. Aiming at structuring the concept of paradata, a taxonomy was developed that distinguishes access-related, response-related and process-related paradata (Kroehne et al., 2015). The value of access-related paradata (e.g. time of participation) for increasing data quality is obvious as they allow, for instance, to compare survey statistics between sub-groups. Response-related paradata (e.g. response latencies) can be incorporated resting on their natural relationship to the substantive data. However, process-related paradata (e.g. raw log-events) is the most challenging category as it is hard to define indicators for data quality comparable between modes. Taking an interdisciplinary perspective on paradata the theoretical framework for log-data in technology-based assessments (Kroehne et al., 2016) tries to solve this challenge by operationalizing indicators using conceptualized states (and finite state machines, FSM). This allows to investigate levels of invariance between modes conditional on process-related paradata.
In the application we use paradata from all three categories in a three-arm mixed mode survey with computer-, paper- and web-based competence assessment in the National Educational Panel Study (NEPS). The study was designed to investigate how survey costs can be reduced by introducing computer-based assessment and replacing group testing of participants invited to universities (highly standardized testing, low participation rates, low dropout during test sessions) with online assessment (minimal degree of standardization, low response burden, high response rates for the beginning of the test, high dropout rates during sessions). Response- and process-related paradata from paper-based assessment were collected using digital pens. We focus on filtering cases from the online assessment with respect to the criterion of acceptable psychometric invariance to group testing. Access-related paradata are used to filter participants taking the test at extreme times. Response-related paradata, in particular response times, are used to filter rapid guessing (i.e., test-taker with responses faster than a threshold that is derived from the comparison between modes). Finally, process-related paradata, i.e., raw-log events, are analyzed using FSM’s. We show that, for instance, breaks of different length in the response process indicated by the absence of any logged interaction between test-taker and assessment platform can be used to effectively filter online-cases with lower data quality. Although the true reason for those breaks, that do not occur under standardized assessment conditions, is unknown, we suggest to interpret longer periods of inactivity as distraction or lack of interest. Using all filters together, we obtained a sub-sample acceptable in terms of invariance and with a resulting response rate that is still higher compared to the group testing.
In the discussion we summarize a) the conclusions drawn from the experimental study for upcoming mixed mode competence assessment in NEPS and b) current challenges to increase the quality of paradata.