ESRA logo
Tuesday 18th July      Wednesday 19th July      Thursday 20th July      Friday 21th July     




Wednesday 19th July, 16:00 - 17:30 Room: F2 103


Handling missing data 3

Chair Dr Tarek Mostafa (University College London )
Coordinator 1Professor George Ploubidis (University College London)
Coordinator 2Mr Brian Dodgeon (University College London)

Session Details

Selection bias, in the form of incomplete or missing data is unavoidable in surveys. It results in smaller samples, incomplete histories, lower statistical power and bias in sample composition if missingness is related to the observed and unobserved characteristics of respondents. It is well known that unbiased estimates cannot be obtained without properly addressing the implications of incompleteness. In this session we focus on item missingness, survey non-response, and attrition over time in longitudinal surveys. We aim to identify best practices when dealing with missing data.

Under Rubin’s framework, three types of missingness exist: Missing completely at ransom (MCAR) where the likelihood of response is unrelated to the respondents’ characteristics. Missing at random (MAR) where the likelihood of response is explained by the observed characteristics of respondents, and missing not at random (MNAR) where the likelihood of response is related to both observed and unobserved characteristics of respondents.

The objective of our session is to examine the principled techniques commonly used to deal with missing data. These include, inverse probability weights, multiple imputation, and full information maximum likelihood (FIML). All techniques rely on the MAR assumption, and therefore, their plausibility depends on the ability of the researcher to identify the predictors of response.

Contributors are welcomed to contrast these techniques with other procedures such as case-wise deletion, mean replacement, regression imputations, selection models (e.g. Heckman selection models), and others. Moreover, theoretical, empirical, and substantive applications of these techniques will be considered for presentation.

Paper Details

1. Comparative Analysis of Approaches to Multiple Imputation Results Aggregation
Miss Irina Zangieva (National Research University Higher School of Economics)
Miss Anna Suleymanova (National Research University Higher School of Economics)

As a result of multiple imputation researcher obtains several complete data bases, then analyzes each of them separately by the same method, and finally aggregate obtained results, using specific formulas, called Rubin rule.
It is clear that to carry out the same analysis several times on each array, and then combine them - very time-consuming task. This process is partly automatized in statistical packages that support multiple imputation, but often the researcher still faces the need to calculate the parameters manually using Ruby rules. In this regard, researchers have repeatedly made attempts to simplify the algorithm for multiple imputation, but until now they have been limited to any specific type of analysis. Thus, there is no theoretical or empirical evidence that effective alternatives to the use of rules Rubin for all other research situations do exist.
This study is an attempt to compare the effectiveness of two approaches to the aggregation of the results of multiple imputation. The first - a classic – Rubin rule. This method is used in almost all studies, where the missing values are imputed. The second possible approach - to change the steps of the classical multiple imputation to simplify the work with him, that is, first, to produce aggregation (in this work - with the help of "averaging") of values substituted on each missing value. As a result we obtain one complete data set, which can be analyzed with planed method.
Obviously, the use of classical, theoretically and methodologically well-designed and repeatedly tested algorithm - way more reliable, but the second approach is much faster and easier to work with multiple imputation and, according to our assumptions, in some research situations, it can more effective rather than Rubin rule. To compare the efficacy of a particular approach is theoretically quite difficult, therefore, for initial testing the assumptions on which directed the study, we will use a statistical experiment.
Thus, this research is intended to establish whether there are research situations in which the aggregation of multiple imputation results by averaging the substituted values and analyzes of a one single array to be more effective than the aggregated results of the analysis using Rubin rules. We believe that the effectiveness of a particular approach depends on the specific research situation, under which in this work we mean a combination of the type of a variable scale study with admissions, the proportion of missing values in the array and data analysis method that will be applied to the data after imputation. In this study, three types of scales will be considered (nominal, ordinal and interval), the cases of 10%, 30% and 50% missing values in an array and a common in sociological research data analysis methods such as descriptive statistics, search for relationships between two variables and linear regression.


2. Handling Missing Data from Instrument Routing Errors
Dr Stephanie Zimmer (RTI International)
Dr Marcus Berzofsky (RTI International)

Missing data in a survey is generally modeled as a random mechanism. However, systematic missing data may be caused by errors in programming of an electronic survey instrument that are not detected until near or after the end of data collection. In this situation, a class of respondents is never directed to questions that should have been presented. Since the missing data mechanism is not random, traditional imputation methods such as hot-deck imputation cannot be applied without thought as there are no donors from the same class. In this paper, we will discuss two routing errors in the 2016 Survey of Prison Inmates, a bias analysis and a proposed imputation method.

We propose using prior iterations of the same survey, with no routing errors associated with the selected analytic variables, to identify appropriate imputation classes and study the possible bias. Studying data from the prior round of the survey, we will determine whether there are classes of respondents that were not mis-routed that are similar to those that were mis-routed in the 2016 survey. We also study bias by examining the difference in the estimate with and without the respondents that were mis-routed.

Additionally, one of the routing errors was discovered during the data collection and was corrected after approximately 7/8 of data collection was completed. Thus, for a small number of cases, we have respondents correctly routed in the current iteration of the survey. We will similarly study these respondents and compare to those that were not mis-routed in the same clusters to determine the potential bias impact.