ESRA logo
Tuesday 18th July      Wednesday 19th July      Thursday 20th July      Friday 21th July     




Friday 21st July, 09:00 - 10:30 Room: F2 106


Surveying non-native speakers of the survey language(s): Representation, coverage, data quality

Chair Dr Michael Ochsner (FORS Lausanne )
Coordinator 1Dr Oliver Lipps (FORS Lausanne)

Session Details

Since some decades, Western countries have faced significant immigration. As a consequence, the population in these countries has become more and more heterogeneous. This poses a number of challenges to survey designers. On the one hand, ethno-national minorities are less likely to be represented in their proper proportion in general population surveys (Deding et al., 2008; Feskens et al., 2006; Myrberg, 2013). This underrepresentation can have a number of reasons, e.g., foreigners and especially those from a more distant culture may be less frequently present in sampling frames and harder to contact (Lipps et al. 2013, Lipps, 2016) cooperate less (Lipps, 2016), or cannot participate because of language problems (Lee et al., 2008; Laganà et al., 2013). On the other hand, some foreigners that do participate might not master the survey language well enough to answer the questions in the intended way, thus introducing measurement error and bias and deteriorating data quality.
While there is a broad range of literature on translation and equivalence issues between countries, knowledge on language issues within countries is scarce. There is also a lack of methods to tackle representation issues concerning minorities. A similar situation prevails regarding measurement bias: there is a wide literature on methods for ex-post evaluation (and correction) of bias. Even though it would be wiser to reduce bias before fieldwork, knowledge on the effect of insufficient mastery of survey language and ways to solve it is missing. Therefore, we suggest a session on issues arising when surveying non-native speakers of the survey language(s).
This session will be dedicated to data quality issues stemming from respondents who are not native speakers of the survey language. As language issues can lead to representation bias as well as data quality issues, we welcome papers which present methods to analyse and/or improve representation of non-native speakers of the survey language(s) by adding or removing survey language(s) as well as papers that suggest methods or tools to analyse and/or improve the data quality of non-native speakers of the survey language(s). We especially seek for papers which analyse effects from changes in the design of repeated cross-sectional or panel surveys, by e.g. introducing or dropping survey languages

Paper Details

1. Who can be added in which survey domain by offering which additional language(s) for which survey topics in easy and complex surveys?
Dr Oliver Lipps (FORS)
Dr Michael Ochsner (FORS)

Little is known about representation effects in surveys from offering additional survey administration languages. Until now, logistic and financial reasons are main drivers.
In this potential analysis, we investigate the possible representation of different groups in Swiss general population surveys defined by different survey topics from adding several languages (English, Serbo-Croatian, Portuguese, Albanian), on top of the three Swiss national languages. Topics investigated are religious affiliation, nationality, education level, occupational activity, migration status, and main mode of transport to school/work.

For the analysis, we conduct a potential analysis using a part of the Swiss pooled yearly census survey from 2010 to 2014. Results show that the level and heterogeneity of those who master one of the three Swiss national languages depend on the topic (person groups) considered and the language mastery needed to complete a questionnaire. The topic with the highest heterogeneity with respect to a “good” language competence is nationality with the categories Swiss, foreigners from a neighboring country, foreigners from an English speaking country, and other foreigners. Also groups distinguished by religious affiliation exhibit a high language competence variation.
The language which reduces this heterogeneity in the most efficient way also depends on the topic considered and the language mastery needed. An important result is that the candidates to be the ‘best’ language to be added in the “basic” language scenario reduce to two (English and Portuguese) while in the “good” language scenario all four additional languages could be the best to be added to reduce heterogeneity across the topic categories considered. Also interestingly, additionally providing English would even increase heterogeneity in both language scenarios if education would be the topic of interest. The reason is that adding English would add native English speakers (who have a higher than average education and who are already well represented without offering English) in the “good” language scenario, and in addition those who learnt English at school in the “basic” language scenario (with the same reason than in the “good” language scenario).

The main message of this paper is that the decision, whether a language should be added to the survey language(s) used anyway, and if yes, which language to add, needs a careful investigation of the (main) survey topic and the degree of language mastery necessary to complete the survey. Some topics may be less sensitive to a potentially decreased heterogeneity from an additional language offered, such as in our research the main mode of travel to work or to school. Other topics may be much more sensitive, such as measuring nationality (especially if only a basic language competence is needed) or the educational level (in both language competence scenarios).


2. How many survey languages? Two examples for adding or reducing survey languages to illustrate effects on representations bias
Dr Michael Ochsner (FORS, Lausanne and ETH Zürich, Switzerland)

General population surveys face increasing linguistig and cultural heterogeneity because of globalization and migration. However, there is a lack of knowledge on the effects of this heterogeneity on the representation bias and response rates of surveys. Linguistic and cultural heterogeneity raises many complex issues, such as translation processes and, depending on the mode, multilingual interviewers or a complicated process of assigning interviewers to respondents. Additionally, survey administrators are increasingly under financial pressure, making it difficult to survey a more complex population with less funding.
In this presentation, I will use two examples for studying the effects of both adding and removing languages to a survey on representation bias or response rates.
I will start with an example for examining the effect of adding languages. Two surveys were administered in three humanities fields at Swiss universities: English and German literature studies, and art history: The first survey was administered in English and German, thus covering the language of the first two fields, including the language of the majority in Switzerland. The second was administered adding a third and fourth language, namely French and Italian, two Swiss national languages that are at the same time very important scholarly languages in the third subject field. I will examine the representation regarding language region (Swiss and French part of Switzerland) and subject field and examine the mother tongue as well as the language chosen to fill in the questionnaire. The results show that adding selected languages can reduce representation bias.
The second example examines reducing languages. A general population survey of a Swiss city was to date administered as a telephone survey in multiple languages. Due to budget constraints and especially severe drops in the response rates over time caused by decreasing phone coverage, a single-language web/paper mixed-mode experiment was conducted. The findings suggest that using a more inclusive mixed-mode design can compensate for some representation bias when reducing languages.
The two examples shed light on the advantages and disadvantages of using multiple survey languages. It also reveals some practical implications for deciding how many and which languages to choose when administering a survey: strategic considerations must be taken in order to balance out ethical and political issues (inclusion of minorities), methodological effects (change of mode to compensate), representation bias (are minorities large enough to make a difference), language competence in the surveyed population, and financial constraints.


3. Language as a determinant for participation rates in Finnish health examination surveys
Dr Hanna Tolonen (National Institute for Health and Welfare (THL), Helsinki, Finland)
Dr Päivikki Kopnen (National Institute for Health and Welfare (THL), Helsinki, Finland)
Dr Katja Borodulin (National Institute for Health and Welfare (THL), Helsinki, Finland)
Dr Satu Männistö (National Institute for Health and Welfare (THL), Helsinki, Finland)
Professor Markku Peltonen (National Institute for Health and Welfare (THL), Helsinki, Finland)
Professor Erkki Vartiainen (National Institute for Health and Welfare (THL), Helsinki, Finland)

In health examination surveys, data is collected through questionnaires, physical measurements and analysis of biological samples. Good understanding of the material is required to understand the invitation, fill-in the questionnaires and provide written informed consent, required for the physical measurements and collection of biological samples.

Finland has two official languages, Finnish and Swedish. The mother tongue of each person is registered in the National Population Information System. Majority of the people living in Finland (95%), speak at least one of these two languages leaving 5% of population speaking other languages. There is legal obligation to provide survey material at least in both official languages.

Health examination surveys, the FINRISK Study, have been conducted in Finland every 5-years since 1972. Information about registered mother tongue has been available from the sampling frame, the National Population Information System since the survey in 1997.

In the FINRISK Study, a random sample of persons aged 25-64 years has been drawn separately for each survey year. The sample size was; 9,900 in 1997; 9,952 in 2002; 7,962 in 2007 and 7,921 in 2012. Invitees receive an invitation letter with a questionnaire and pre-defined appointment time for the health examination. They are asked to fill-in the questionnaire at home and return it during the health examination visit.

The proportion of people in the sample having some other language than Finnish or Swedish as their mother tongue has increased over the years from 1.8% in 1997 to 5.5% in 2012. About 2% of population had Swedish as their mother tongue in all survey years. Those who don’t have Finnish or Swedish as their mother tongue also tend to have lower education than those with Finnish and Swedish as their mother tongue.

When comparing the participation rates between these three language groups, a clear difference was observed. In all years, the participation rate was lowest for those having other languages than Finnish or Swedish as their mother tongue and highest for those having Finnish as their mother tongue. The participation rate has been declining among those with Finnish as mother tongue but at the same time remained relatively stable among two other language groups. In 1997, the participation rate among the Finnish group was 72%, among the Swedish group 68% and among others 50%. By 2012, participation rates among the Finnish group had declined to 63%, but were 69% among the Swedish group and 49% among others.

In Finland, an increasing number of people who don’t speak either of the official languages as their mother tongue are affecting the survey organization. Traditionally it was enough to have survey material in Finnish and Swedish as required by law but nowadays there is pressure to have material also in other languages such as English and Russian. This obviously increases the survey cost but at the same time may help to increase the participation rate.


4. Multiple Strategies for Reaching out to Spanish-speaking respondents in an IRS Household Survey.
Ms Jennifer McNulty (Westat)
Dr Jocelyn Newsome (Westat)
Dr Kerry Levin (kerrylevin@westat.com)
Ms Brenda Schafer (IRS)
Mr Pat Langetieg (IRS)
Dr Saurabh Datta (IRS)

In recent years, survey methodologists have sought to increase response from Spanish-speaking respondents. About 16 million people in the United States are Spanish speakers with no or very limited English proficiency. Studies have shown that mail surveys are likely to underrepresent Spanish speakers (Caporaso et al., 2013)—particularly when materials are presented only in English. In one study, response rates among Spanish speakers were half that of English speakers (McGovern, 2004). While sending survey materials in both English and Spanish to all respondents has been shown to increase Spanish response (Brick et al., 2012), this approach can be prohibitively expensive.
We will examine efforts made to increase Spanish-language participation in a large annual household survey. The IRS Individual Taxpayer Burden (ITB) survey is an annual multi-mode survey sent to 20,000 individuals in the United States. It measures the time and money taxpayers spend complying with tax law regulations. The IRS ITB Survey is currently being fielded for the sixth consecutive year. Each year, most respondents choose to complete the paper survey. The survey is offered in both English and Spanish, but few respondents complete the survey in Spanish. Although Spanish-speakers may choose to complete the survey in English (perhaps with assistance from family or friends), it seems likely that Spanish-speakers are underrepresented.
With each fielding of the survey, our researchers have sought to improve Spanish-language response through a variety of methods. Although it was not feasible to send all respondents all materials in both languages, over the years, we have incorporated a number of techniques to target Spanish-speakers. These include: sending a Spanish version of the IRS prenote in addition to the English version; increasing the number of modes offered in Spanish (from phone-only to phone, web, and mail); allowing web survey respondents to easily toggle between languages; offering a dedicated Spanish-language customer service phone line; incorporating a Spanish-language callout on English-language materials; and providing web instructions in Spanish.
In this paper, we will discuss the impact of these techniques on the number, mode, and timing of Spanish-language completes, as well as the number and type of calls received on our Spanish-language customer service phone line.


5. Language proficiency among respondents and implications for data quality in a face-to-face longitudinal survey
Mr Alexander Wenz (University of Essex)
Dr Tarek Al Baghal (University of Essex)
Dr Alessandra Gaia (University of Essex)

When surveying immigrant populations and members of ethnic minorities, survey researchers have to consider that respondents vary in their level of language proficiency. Large-scale national surveys often provide translated questionnaires to respondents who do not master the survey language well enough and might otherwise not be able to participate. However, not all respondents with low levels of language skills might choose to use translated questionnaires, and survey translations might only be available for a limited number of languages. Respondents completing the survey in a language that they do not master well might have problems in understanding survey questions or in reporting their answer, which might affect the quality of responses they provide.
This paper provides insight into the impact of native language proficiency on survey data quality. At the first wave of Understanding Society: The United Kingdom Household Longitudinal Study (UKHLS), a large sample of respondents were asked about their English-language abilities. Questions included whether English is their first language and any difficulties arising from speaking, reading, and understanding English. The responses to these questions are used to compare data quality amongst those answering the survey in English, using a large number of responses for each respondent. In addition, we have coding of all survey measures on 13 question characteristics including measures of task difficulty and risk of socially desirable reporting.
Using these additional measures of language capability and question characteristics, we initially explore data quality outcomes in a similar way to the recent study by Kleiner et al. (2015). Data quality measures include missing data (through “don’t know” or “refused” answers), the presence of primacy or recency effects, and possible straight-lining of responses in grids. We explore both aggregated data quality measures as well as models of responses within respondents, estimating the impact of question characteristics.
We also add two important extensions relating to language proficiency and measurement of data quality beyond the measures used to identify ability and different question coding. First, UHKLS has a self-completion section, so we are able to explore differential impacts of language ability in aural and visual administration of the survey. Second, we further our understanding by leveraging the longitudinal aspect to UKHLS. We explore the amount of change, an important indicator of data quality in longitudinal studies, and how change differs by respondents’ language ability and question characteristics. We are also able to see if there are any changes in the data quality measures (i.e. DK/REF responses, primacy/recency, straight-lining) across waves by language proficiency.
Initial results suggest that non-native speakers of English, particularly those with difficulty speaking and reading provide more DR/REF responses. Additionally, non-native speakers are more likely to reject completing the self-completion part of the survey than native English speakers.