ESRA logo
Tuesday 18th July      Wednesday 19th July      Thursday 20th July      Friday 21th July     




Thursday 20th July, 16:00 - 17:30 Room: N 101


Matrix design for social surveys

Chair Professor Christof Wolf (GESIS -- Leibniz-Institute for the Social Sciences )
Coordinator 1Professor Dominique Joye (University of Lausanne)

Session Details

General social surveys increasingly face problems of low response rates and increasing cost. In this situation alternatives to the traditional face-to-face mode are gaining interest. For these other modes, e.g. web surveys, the typical length of a general social survey is a challenge. Instead of 60 or more minutes one is probably restricted to 20 minutes. In this situation a matrix design for the questionnaire could be promising solution. In this design the original questionnaire is divided into modules and then only some of these are presented to groups of respondents. As a result we obtain a data matrix with many “holes” or missing data.
We welcome papers that address one or more of the following questions:
How to divide questions into modules? (How many modules? Thematic or random distribution of items? Should a core module be used or not? In how far does the specific mix of modules affect response behavior?)
How to distribute modules over respondents and how to implement this in the field? In particular if mixed mode approach is applied to data collection.
How must the resulting data be prepared to ease analysis? Should missing data be imputed? How can replicability be ensured? Do we need special methods to analyze the data?
Which problems does a matrix design pose for repeated cross-sectional studies concerning comparability over time? In the same way we can ask about the consequences for cross-national comparability?

Paper Details

1. Reduction of Survey Length through Split Questionnaire Design: Consequences for Nonresponse and Measurement Error
Professor Andy Peytchev (University of Michigan)
Dr Emilia Peytcheva (RTI International)
Professor Trivellore Raghunathan (University of Michigan)

Split questionnaire design (often used synonymously with matrix sampling, only a distinct number of instrument versions are used) provides an option to ask a full set of questions in a reduced survey. The motivation is often to reduce respondent burden that results in nonresponse. In some instances it may be to reduce measurement error. Empirical evidence linking survey length to nonresponse error is limited. Evidence linking survey length to measurement error is virtually nonexistent. This lack of experimental evaluation is likely a key factor leading to the hesitation in the adoption of this approach.

In this study, we use a factorial experimental design to manipulate survey length and the location of the survey items to separate nonresponse from measurement error. The survey will be conducted at the beginning of 2017, as part of a study funded by a research grant from the U.S. National Science Foundation. Sample members will be offered a promised incentive and invited by mail to complete a survey online, that would take either 20 or 40 minutes. The order of the survey modules are also experimentally manipulated. We will have preliminary results at the time of the ESRA conference.


2. Splitting the questionnaire an alternative to matrix design in social surveys
Miss Evangelia Kartsounidou (Aristotle University of Thessaloniki)
Professor Ioannis Andreadis (Aristotle University of Thessaloniki)

One of the main drawbacks that web surveys have to face is the low response rate. A significant reason of low response rates is the length of the questionnaire. Previous research has shown that lengthy questionnaires lead to lower response rates, mainly due to burden. Hence, creating shorter questionnaires could be a solution to this problem. A method of doing this is splitting the questionnaire in shorter parts and sending each part to a different sub-group of the sample as matrix design proposes. The process of matrix design dictates sending only one invitation to each respondent. This means that each respondent has the opportunity to answer only to the questions displayed in the part of the questionnaire he/she receives. As a result, we obtain a dataset with many missing data. This paper relying on the same concept explores what happens if instead of stopping the procedure with a single invitation to each respondent, we send new follow-up invitations asking them to respond to a different part of the questionnaire after completing the first part of the survey. We compare the response rates and the quality of the datasets collected under three different conditions: i) full long questionnaire without any splitting, ii) matrix design with a single invitation and iii) matrix design with an additional invitation for the rest of the questionnaire.
Using the Greek Candidate Survey of 2015 as a case study we implemented the following experiment. All the units of the target population were split randomly in two different groups. The respondents of the first group (A) received the extended version of the online questionnaire with 85 pages (most of the pages include only one question). The respondents of the second group (B) received a short part (part 1) of the questionnaire which includes 20 pages; while the rest of the questions were sent later in subsequent successive phase as a separate questionnaire (Part 2) only to those who have completed the first part.
Our findings show that sending an additional invitation for the rest of the questionnaire to the respondents improves significantly the number of fully completed questionnaires. Thus, we believe that our approach can be very fruitful in situation similar to our case: For the Greek Candidate Survey the biggest difficulty and the most time consuming task was to collect the email addresses of the candidate MPs. Apart from this, all the other tasks had minimal costs. Therefore, it makes sense to take the most out of this effort. The result was 302 completed questionnaires (187 in survey A and 115 in survey B) out of 1119 invitations (546 in survey A and 573 in survey B). As it was expected, the shorter questionnaires gave us higher response rates. One out of two of the respondents who have completed the first part and received invitation for the rest of the questionnaire have completed it. Consequently, sending an additional invitation has resulted in reducing the number of missing values by 50%.


3. How useful is data fusion when employing matrix designs in surveys? Illustrating potential benefits and limitations using the Austrian and European Social Survey
Mr Dimitri Prandner (Johannes Kepler University of Linz (Austria))
Professor Johann Bacher (Johannes Kepler University of Linz (Austria))

As survey research is confronted with the realities of lowering response rates for individual surveys as well as the growing number of surveys that are fielded, the challenges as well as the costs to acquire high quality data sets are increasing fast. The proposed paper aims to explore if some of those challenges could be alleviated by employing matrix designs in combination with data fusion, based on multiple imputation, using shared variables from core modules to fuse datasets.

In theory, the concept of fielding surveys, that not only feature both, core modules as well as specialized modules,but are also designed with the idea to fuse different data sets in mind, could help to gain deeper insights into societal developments, tackle questions that can’t be solved with the information contained in a singular survey as wells e.g. cut down the amount of repeated questions needed across surveys.

Yet there is a problem with this Proposition. It contains the potential risk of creating spurious relationships between variables originally observed in different datasets. This would be the case if the central assumption of local stochastic independence is violated – meaning: that the variables Z shared between the datasets would not explain the module – and thus survey – specific variables X and Y.

Yet as long as the assumption that P(X,Y|Z)=P(X|Z)·P(Y|Z) or alternatively f(X,Y|Z)=f(X|Z)·f(Y|Z) is valid, the datasets would lend themselves to data fusion processes, that provide new insights, without generating spurious relationships.

To test these assumptions three different applications were developed, all of them based on the Austrian Social Survey 2016 (itself built upon a thematic based matrix design for modules) as well as the Austrian data gained during the seventh wave of the European Social Survey.

Both surveys are representative for the general population of Austria and conducted face to face interviews. Additionally, they offer several shared modules, as well as modules that share topics, but not items. Using this as a foundation it was decided to use a broad range of shared socio-demographic items, as well as a shared module based on the Schwartz-Value-Scale as common variables (Z) to explore if information about household structure, political orientations as well as attitudes towards social justice could be imputed.

The results were largely consistent: Some – not all – of the previously existing correlations could be transferred in all three applications and no previously non-existing correlations manifested. While this provided a strong indicator that a well-adjusted model can elevate the concerns of spurious relationships, a central problem emerged once the plausibility checks concerning the imputed data were completed . A small number of non-realistic cases could be found in all datasets, but especially evident in the case tied to the political orientations, raising concerns.

Those findings will be used to further discuss the application of data fusion when employing matrix designs for surveys and reflect on risks associated with imputed data.


4. Preparing a mobile survey design for official statistics
Mr Peter Lugtig (utrecht university)
Mrs Annemiek Luiten (Statistics Netherlands)
Mrs Vera Toepoel (Utrecht University)

Web surveys need to adapt to a mobile world. While some respondents are unlikely to complete 45-minute web surveys on a desktop computer, fewer are likely to do this when they use their smartphone to complete a survey. In order to prepare for the survey future, we need to shorten our surveys, and matrix sampling – or planned missingness, provides one way forward.
In this presentation we will discuss how we will use matrix sampling in the Health survey, conducted by Statistics Netherlands. This survey takes about 45 minutes to complete. Fieldword is continuous. Every month, about 1500 respondents complete the survey, but breakoff rates are relatively high when respondents start the survey on a mobile phone.
We have information from current data for the full survey, and on this basis carry out a simulation study to inform how we van best implement a matrix-sampling design for future health surveys. The actual matrix design will be implemented in August 2017, and will not be the topic of discussion here. Rather, we focus on the advantages and disadvantages of several possible matrix sampling designs.
We specifically study the effects on the power (standard error of estimates) of each matrix design. High correlations between items that are missing and non-missing should create the most efficient design. However, this may come at the cost of creating a questionnaire that is not easy for respondents to complete. We will pay attention to several potential difficulties: questionnaire routing may make it infeasible for specific designs to be implemented, while a changing questionnaire may lead to different context effects and alter correlations between items. In short, simply optimising the questionnaire so that missing items can be imputed with other items that correlate highly, will not work in practice. This paper shows results of the simulation study, and discusses how the simulation study, and practical considerations have informed the actual matrix design chosen in the Health survey 2017.