| New Strategies of Assessing Data Quality within Interviewer-Administered Surveys 2 | |
| Session Organisers | Dr Laura Silver (Pew Research Center) Mr Kyle Taylor (Pew Research Center) Ms Danielle Cuddington (Pew Research Center) Dr Patrick Moynihan (Pew Research Center) | 
| Time | Wednesday 17th July, 16:30 - 17:30 | 
| Room | D17 | 
International survey researchers are no strangers to the difficulties inherent in assuring high-quality data, particularly in a post-GDPR environment where access to audio files -- a key mechanism to verify the caliber of interviewing -- may be severely restricted. Moreover, closely monitoring or investigating every sampled case is unlikely given resource constraints (e.g., limited time, budget and capacity), driving researchers to base evaluations on aggregate measures of data quality, such as interview length (in its entirety or by sections), extreme item nonresponse and other related substantive and paradata indicators. 
For survey practitioners, this raises a critical question: Which data-quality indicators are most valuable for identifying problems in the field -- and, by extension, low-quality interviewing? Are certain indicators better associated with identifying certain problems? And what thresholds are used to distinguish between a case worth analyzing and one requiring more investigation? More broadly, how do these issues play out across comparative data as well as between locations and modes?
Once potential problems are determined, identifying the best course of action to resolve the issue can be a challenge. Resolving the issue can involve anything from simple case deletion (with requisite re-weighting, as applicable) to deletion of all interviews by a conducted by an interviewer or observed by a given supervisor to complete re-fielding. 
Taken together, the goal of this session is to bring together researchers to discuss the measures they use to assess data quality, the thresholds they apply and the actions they take to resolve problematic cases. Topics may include but are not limited to:
Assessing the validity of cases flagged as “low quality” across different indicators;
Setting thresholds for quality control – that is, what is “too short” or “too long” and how do you determine that across different countries, languages, and modes;
Research that tackles new and innovative ways to expose “curbstoning” and other practices that lead to low-quality data;
Methods used to verify proper in-home selection;
Strategies used to detect respondent confusion, satisficing, and discomfort;
Research focused on when evaluating when and how to replace low-quality data, including, issues of substitutions and implications for data quality and final data representativeness.
We will limit this particular session to face-to-face and telephone interviewing, rather than online interviewing. We invite academic and non-academic researchers as well as survey practitioners to contribute.
Keywords: data quality, paradata, speeding, curbstoning, replacement, in-home selection
Dr Galina Zapryanova (Gallup)
Dr Anita Pugliese (Gallup)
Mr Jay Loschky (Gallup) - Presenting Author
The growth of CAPI technologies has consistently expanded the availability of paradata during face-to-face interviewing. This paper will present findings from implementing a new centralized CAPI system for collecting representative face-to-face survey data across more than 100 countries, 5 continents and over 120 languages. Centralization allows for greater monitoring of the fieldwork progress, access to a wide variety of rich new paradata and active quality control at all stages of data collection. It also presents challenges for multi-country surveys where the efficiencies gained from setting standardized global thresholds for flagged interviews should be balanced by adjustments needed for variation in regional or country context. We will discuss the capabilities and limitations of these paradata tools for ensuring methodology compliance during questionnaire implementation in the field, focusing primarily on lessons learned from using item-level timestamps. Under what circumstances can time metrics in combination with other QC indicators indicate the need for further investigation or cancellation of survey interviews? What are the most reliable indicators at a global level for identifying intentional data falsification or unintentional errors in methodology? Which indicators, on the other hand, show the most regional/country variation and, therefore, need to be reviewed more holistically and contextually during the QC process? Our paper will address these matters using data from the Gallup World Poll nationally representative surveys conducted in 2018 under a centralized CAPI system in the majority of face-to-face countries. Effective fieldwork monitoring should be goal-driven, holistic and efficient where information from multiple sources is produced and analyzed at frequent intervals. Developing strong fieldwork monitoring tools and using them in a systematic manner is key to collecting accurate and representative data in face-to-face countries and this paper will thus contribute to our understanding of effective QC processes in the global context.
Mrs Gina Cheung (SRC, University of Michigan) - Presenting Author
Mr Jay Lin (SRC, University of Michigan)
Many computer-assisted personal interview (CAPI) software captures paradata (i.e., empirical measurements about the process of creating survey data themselves), computer user actions, including times spent on questions and in sections of a survey (i.e., timestamps) and interviewer or respondent actions while proceeding through a survey. In these cases, the paradata file contains a record of keystrokes and function keys pressed, as well as mouse actions. These paradata files are transmitted along with the survey data and can be used for quality assurance checks and reporting, particularly when interviews are not audio recorded. 
This presentation uses data from (1) the Malaysia Ageing and Retirement Survey (MARS) in collaboration with the Social Wellbeing Research Centre at University of Malaya and the Survey Research Center (SRC) at University of Michigan (UM); and (2) the Evolution of Health, Aging, and Retirement in Thailand in collaboration with the National Institute of Development Administration and SRC at University of Michigan. The two studies both adapt same complex instrument designs from the US Health Retirement Study but encounter different challenges such as multi- vs. single languages, centralized vs. decentralized management structures, self vs. proxy interviews, etc. 
This presentation focuses on the analysis of keystroke data to assess data quality. We first examine a series of key characteristics between two studies such as sample design, team structure, interviewer and respondent characteristics, etc. These characters are then inspected for predictive power against data quality indicators such as interview length, non-response, response changes, etc. Subsequently, in MARS we call back to households that have data quality concerns to verify interviewer’s behavior or some survey data collected, among all other information available. Finally, we will present how these analyses of paradata and verification results can be practically applied to improve data quality of interviewer administered surveys. 
Ms Melike Saraç (Hacettepe Üniversitesi) - Presenting Author
Professor İsmet Koç (Hacettepe University)
Respondent selection methods which follow probabilistic sampling procedures to get a representative sample are widely used in household surveys. Selecting an eligible respondent among household members to interview is mostly employed with the aim of reducing cost, namely both time and money, obtaining higher cooperation rates as well as taking precaution in sensitive surveys. Together with households, all women at the reproductive age (15-49) who live in selected households are the main sample units of 2013 Turkey Demographic and Health Survey (TDHS-2013). In other words, any respondent selection process is not employed and as a result, all eligible women in households are interviewed in the context of TDHS-2013. In this sense, main research question is to ask the question on how would main characteristics of women (age, years of schooling, working status and etc…) and demographic indicators (number of children, number of pregnancy, total fertility rate and etc..) differ with the use of various respondent selection techniques (last birth day method, first birth day method, Kish method, full enumeration method, oldest women method, youngest women method, TCB method and arbitrary convenience method) compared to survey estimates of TDHS-2013. The findings put forward that although the results of any selection methods are close to each other and close to survey estimates of TDHS-2013, most of the estimates produced by last birthday method and Kish method are much closer to survey estimates compared with other selection techniques. This implies that although interviewing with all eligible women in households may be reasonable in TDHS-2013 in order to produce unbiased estimates for all the indicators even for rare events, it appears to be possible to reach closer estimates within the same confidence intervals by using one of the respondent selection methods, such as last birthday method and Kish