Program at a glance 2021

Bias: Social desireability-, underreport- and response -bias

Session Organiser Tugba Adali
TimeFriday 9 July, 13:15 - 14:45

The social desirability bias across modes: The case of the Czech experimental survey

Dr Martin Lakomý (Masaryk University) - Presenting Author
Ms Barbora Hubatková (Masaryk University)

Mixed-mode survey based on the push-to-web model has proved a rapidly increasing method of data collection. However, some issues – including the mode effect and differing effect of social desirability across modes – have not been resolved. Furthermore, sensitivity and social desirability of specific topics vary across cultures, and generalisation of findings is limited in this area. We present some findings of the project of applied research implementing the mixed-mode design into the cultural and legal context of the Czech Republic. The analysis is based on survey data collected via push-to-web design in winter 2021. The study analyses both questions with higher item nonresponse (such as income) and questions with biased answers (sexual behaviour). Its main focus is the comparison of answers to several sensitive topics of sociological and demographic research for CAWI and CAPI. Then, we also analyse the relationship between these answers and social desirability scale for each utilised mode. The conclusions address the assumption that less compatible modes (i.e. one interviewer-administered mode and one self-administered mode) should not be combined for topics sensitive within the specific context.

Puzzling Findings in a List-Experiment for Estimating Anti-Immigrant Sentiment and Social Desirability Bias

Dr Sebastian Rinken (Institute for Advanced Social Studies (IESA), Spanish Research Council (CSIC)) - Presenting Author
Dr Sara Pasadas-del-Amo (Institute for Advanced Social Studies (IESA), Spanish Research Council (CSIC))

Survey respondents may deliberately mark incorrect scores when perceiving their true opinions, attitudes, or behaviors to be socially ill-regarded. To reduce this kind of measurement error, commonly labelled “social desirability bias” (SDB) or “socially desirable responding”, survey methodologists have developed various anonymity-maximizing techniques. One such is the list-experiment, also known as item-count-technique (ICT), which consists of aleatorily dividing the sample in treatment and control arms with a view to comparing the mean scores obtained when asking about the number of items that elicit a specific attitude or assessment on the respondents’ part. The item of interest is included only in the treatment group’s list. Since no request is made to identify the relevant items, just the number, the method assumes that respondents will answer truthfully. Comparison with the percentage obtained by a direct question on the same sensitive item is supposed to reveal the scope of SDB.

This paper reports on a list-experiment regarding anti-immigrant sentiment (AIS) and related SDB fielded in Spain in the autumn of 2020 (N=1.965) in the framework of a study on immigration attitudes financed by the European Fund for Regional Cohesion and the Spanish Ministry of Science (ref. CSO2017-87364-R). AIS is well-suited for studying SDB since the manifestation of animosity toward immigrants is prone to be associated with racism and xenophobia – attitudes which are shunned as inherently illegitimate in contemporary Western societies. The experiment consisted in inquiring about four (control) or five (treatment) social groups that may trigger antipathy; a direct question on antipathy toward immigrants was administered to the control arm only. The sample was recruited from a probability-based internet panel and was allocated according to the following strata: sex, age-group, and level of education.

Although we had run a pretest with encouraging results, the full-scale experiment failed on two crucial counts. (1) For many sociodemographic profiles, the mean scores obtained for the experiment’s treatment arm are inferior to those obtained for the corresponding control – despite referring to a more extensive list. Such negative differences-of-means provide evidence of the so-called deflation effect (Zigerell, 2011). This finding might be seen to suggest that treatment group respondents marked artificially low scores to prevent even the remotest possibility of association with AIS. However, (2) ICT-based estimates of AIS are inferior to those based on the DQ both on aggregate (10,8% vs. 17,9%) and for many respondent categories.

The presentation will try to come to grips with the puzzle of why respondents might declare AIS more openly in obtrusive measurement than in the list-experiment.

I do not belong to a group that experiences discrimination. A current salient issue or constant underreport bias?

Mr Marvin Brinkmann (Mannheim University, MZES) - Presenting Author

Discrimination is a salient issue and its research field is growing rapidly. Does the saliency also lead to higher indications on discrimination experiences in the population? This paper will break this line of argumentation and show that the European Social Survey (ESS) item on discrimination experiences (Billiet 2001) heavily underscores actual discrimination experiences for a long time. This also seems to be true as only 9-13% of people with migrations background in the ESS indicate discrimination experiences in the repeated measure whereas other studies for example in Germany indicate values of up to 80% (Sauer 2018). Evidence comes from four experimental designs in two weighted online surveys on people with migration background, both administered in Germany in December 2020 with one study including data from a repeated wave in April 2021. While the previous level of the European Social Survey could be replicated without experimental manipulation of the question, hence, excluding a period effect, the manipulated outcomes can more than double the reporting of experienced discrimination among people with migration background. The results in the German context are so clear that similar results can be expected in European comparison.

Overclaiming technique- a solution for self-enhancing bias in self-assessment questions? Validity analysis on the basis of the PISA 2012 data.

Dr Marek Muszyński (Institute of Philosophy and Sociology, Polish Academy of Sciences) - Presenting Author

Overclaiming technique (OCT) was proposed (Phillips & Clancy, 1972; Paulhus et al., 2003) as a new method with which socially desirable responding (SDR) and, in general, self-enhancing biases, could be controlled in self-report data. The new method attracted significant research attention, however, its practical utility to yield more valid questionnaire results (scores) was not often tested and brought only mixed results regarding its efficiency to control spurious variance by acting as suppressor in criterion-related validity studies (Kyllonen & Bertling, 2013; Pokropek, 2014; Yang et al., 2019; Yuan et al., 2015). Thus, there is a large call in the field for more OCT validity studies (Bing et al., 2011; Ludeke & Makransky, 2016; Paulhus, 2011).

Therefore, the presented work is devoted to assess OCT's utility to enhance criterion-related validity of self-report by accounting for response biases and yielding a suppressor effect on the relation between a self-report scale and a related ("objective") criterion.

In this research the math familiarity scale from the PISA 2012 dataset (the only, to date, large-scale international database in which OCT was implemented) was used as a self-report scale and PISA test math score was be used as an objective criterion. The assumption was that self-report should correlate positively with the criterion at least in the range of 0.30-0.40 of the standardised correlation coefficient (zero-order correlation case) as evidenced by reviews of self-report validities (e.g. Ackerman et al., 2002; Mabe & West, 1982; Zell & Krizan, 2014). Hence, it was assumed that OCT scores would act as a suppressor (Paulhus, Robins, Trzesniewski & Tracy, 2004; Tzelgov & Henik, 1991) for the relation between math familiarity self-report scores and cognitive (math test) scores, resulting in an elevated relation between the self-report and the math test and a boost of R2 index as an indicator of increased validity of the self-report after the OCT scores would have been introduced to the regression equation.

Moreover, three most popular types of OCT scoring systems were used and compared in the above analyses: a) indices based on IRT scores, b) indices based on signal detection theory (SDT), advocated for OCT scoring by Paulhus et al. (2003; cf. Paulhus & Petrusic, 2010), c) "common sense indices", proposed by Vonkova et al. (2018) as simplification of the SDT indices.

The obtained results confirmed that OCT scores can indeed act as suppressor and enhance self-report criterion-related validity. Moreover, important differences between the three scoring systems were found, indicating that the choice between them is not trivial as it influences results’ interpretation and model specification. Furthermore, interpretation caveats were raised against the SDT OCT scores, corroborating some earlier sceptical voices regarding their use (e.g. Goecke et al., 2020; Paulewicz & Blaut, 2020). The results are commented in the broader light of OCT validity and scoring.

How Do Survey Interviewers Handle Respondents’ Satisficing Tendencies? An Analysis Based on Audio-Recordings of Face-to-Face Interviews

Ms Silvia Schwanhäuser (Institute for Employment Research (IAB)) - Presenting Author
Ms Bettina Müller (Institute for Employment Research (IAB))

Non-differentiation, extreme responding, or item nonresponse are commonly attributed to respondents’ satisficing behavior. In interviewer-administered surveys, the effect of such response styles on survey outcomes may additionally depend on interviewer behavior (Loosveldt & Beullens 2017). While interviewers’ assistance of the answering process and adherence to standardized interviewing may counteract respondent satisficing (Heerwegh 2008), interviewers may as well attune to individual response styles in a way that promotes satisficing, e.g., by simplifying the presentation of response scales. Previous research indicates that differential interviewer behavior is indeed an important contributing factor to measurement error (e.g., Loosveldt & Beullens 2017; Olsen & Bingen 2011). However, this research so far lacks direct measures of interviewer behavior necessary to disclose the underlying mechanisms.

We contribute to this research with a more detailed assessment of interviewer influences based on direct measures of interviewer behavior from audio-recordings of face-to-face interviews in the German panel study “Labour Market and Social Security” (PASS). The PASS survey is administered to a sample of welfare recipients, comprising a high proportion of individuals with migration background and low education. This population may face greater difficulties in answering survey questions and thus be more prone both to satisficing and interviewer influence as compared to the general population. Interviewers on the other hand may be more inclined to deviate from standardized interviewing.

Applying multilevel random intercept models, we find significant interviewer effects on non-differentiation, extreme responding, and item-nonresponse in the PASS study. To discern the sources of these influences, we analyze audio-recordings of interviews. Specifically, we apply behavior coding to compare exceptional interviewers with positive and negative effects on the data quality indicators under study. Ultimately, this research provides practical guidance for interviewer training and monitoring.