# ESRA 2013 Sessions

Assessing the Quality of Survey Data 1 | Professor Jörg Blasius |

This session will provide a series of original investigations on data quality in both national and international contexts. The starting premise is that all survey data contain a mixture of substantive and methodologically-induced variation. Most current work focuses primarily on random measurement error, which is usually treated as normally distributed. However, there are a large number of different kinds of systematic measurement errors, or more precisely, there are many different sources of methodologically-induced variation and all of them may have a strong influence on the "substantive" solutions. To the sources of methodologically-induced variation belong response sets and response styles, misunderstandings of questions, translation and coding errors, uneven standards between the research institutes involved in the data collection (especially in cross-national research), item- and unit non-response, as well as faked interviews. We will consider data as of high quality in case the methodologically-induced variation is low, i.e. the differences in responses can be interpreted based on theoretical assumptions in the given area of research. The aim of the session is to discuss different sources of methodologically-induced variation in survey research, how to detect them and the effects they have on the substantive findings. | |

Assessing the Quality of Survey Data 2 | Professor Jörg Blasius |

This session will provide a series of original investigations on data quality in both national and international contexts. The starting premise is that all survey data contain a mixture of substantive and methodologically-induced variation. Most current work focuses primarily on random measurement error, which is usually treated as normally distributed. However, there are a large number of different kinds of systematic measurement errors, or more precisely, there are many different sources of methodologically-induced variation and all of them may have a strong influence on the "substantive" solutions. To the sources of methodologically-induced variation belong response sets and response styles, misunderstandings of questions, translation and coding errors, uneven standards between the research institutes involved in the data collection (especially in cross-national research), item- and unit non-response, as well as faked interviews. We will consider data as of high quality in case the methodologically-induced variation is low, i.e. the differences in responses can be interpreted based on theoretical assumptions in the given area of research. The aim of the session is to discuss different sources of methodologically-induced variation in survey research, how to detect them and the effects they have on the substantive findings. | |

Assessing the Quality of Survey Data 3 | Professor Jörg Blasius |

This session will provide a series of original investigations on data quality in both national and international contexts. The starting premise is that all survey data contain a mixture of substantive and methodologically-induced variation. Most current work focuses primarily on random measurement error, which is usually treated as normally distributed. However, there are a large number of different kinds of systematic measurement errors, or more precisely, there are many different sources of methodologically-induced variation and all of them may have a strong influence on the "substantive" solutions. To the sources of methodologically-induced variation belong response sets and response styles, misunderstandings of questions, translation and coding errors, uneven standards between the research institutes involved in the data collection (especially in cross-national research), item- and unit non-response, as well as faked interviews. We will consider data as of high quality in case the methodologically-induced variation is low, i.e. the differences in responses can be interpreted based on theoretical assumptions in the given area of research. The aim of the session is to discuss different sources of methodologically-induced variation in survey research, how to detect them and the effects they have on the substantive findings. | |

Hierarchical data, what to do? Comparing multi-level modelling, cluster-robust standard errors, and two-step approaches | Dr Merlin Schaeffer |

Social scientists are frequently interested in the importance of social context for people's actions, attitudes, interests and so on. For example, researchers want to know whether pupils in small classes perform better than those in large ones, whether people living in socio-economically deprived neighbourhoods are less satisfied with their lives, or whether the relationship between socio-economic status and health depends on a country's health care system. Generally speaking, these relationships have a multi-level structure in that outcomes and explanatory factors are situated at two different levels: individual and contextual. To test hypotheses about contextual effects, sociologists and political scientists are usually taught to use random intercept and slope models. Most economists, by contrast, seem to rely on ordinary least squares estimation with cluster-robust standard errors. Finally, some researchers use a two-step approach: They first obtain context-specific estimates of parameters of interest which are then modelled as a function of contextual characteristics in a second step. Although they address similar problems, these different approaches have largely developed in isolation from each other and thorough comparative discussion is lacking. As a consequence, little is known about the relative advantages and pitfalls of the different approaches: Are they approximately equivalent, is one more reliable in general, or are different approaches appropriate in different situations, e.g., is one better-suited for comparisons of a small number of large and highly distinct clusters such as countries, while another works best when studying a large number of relatively similar smaller clusters such as school classes? We invite submissions that aim to answer these questions by comparing different approaches to analyse hierarchical data. We particularly welcome simulations as well as analytical studies. However, empirical analyses that demonstrate differences between approaches for a particular case are also of interest. | |

Latent variable modeling of survey (measurement) errors and multiple groups | Professor Joop Hox |

Modern survey designs generally use complex sampling featuring cluster sampling, stratification and a diversity of weighting adjustments. These can be controlled using design based inference, or by using a model-based approach that includes such features explicitly in the analysis model. In addition to these complexities, comparative and longitudinal surveys need to control the measurement equivalence across groups or over time. Finally, the trend to use multimode data collections adds a relatively new source of survey error. This session focuses on latent variable models for measurement errors in surveys that include other survey error components, with a goal towards estimation of population parameters of interest that are adjusted for a number of survey error components. Presentations can be on new models, new correction methods, new estimation techniques, or applications of such methods on existing survey data. An interesting aspect is the application of such models to data sets that can be viewed as problematic from an estimation point of view. One example is the analysis of measurement invariance with a large number of groups such as countries. A second example is multilevel analysis with a small number of groups or countries. Bayesian estimation methods may be attractive for such problems, and this session welcomes presentations that explore their possibilities. | |

Measurement Invariance | Professor Bengt Muthen |

Methodological advances in Latent Class Models for Surveys | Dr Daniel Oberski |

Latent class modeling (LCM) is a very general technique that encompasses many different statistical models as special cases; the common thread is that of latent variable models where the latent variable is considered as discrete. Examples of special cases include latent structure analysis, latent Markov models, mixture models, model-based clustering, diagnostic test evaluation, nonparametric IRT, and latent class factor models. When applied to surveys, latent class models may be useful as a way of relaxing the sometimes stringent assumptions in survey error analysis. For example, the assumptions of the linear factor model for the estimation of scale reliability, or of monotone systematic errors in cross-country invariance models may be relaxed. At the same time, survey errors pose complications in latent class models that may be simplified or may not arise at all in other types of models. For example, the classical result that measurement error in the dependent variable does not affect regression coefficients does not hold for discrete latent variables. LCM therefore offers both opportunities and challenges for survey researchers, which leads this session to invite presentations on the following topics: - Latent class modeling for the estimation or evaluation of survey errors; - The effect of survey errors on latent class modeling ; - Methods of coping with survey error in latent class models. Survey errors could include, but are not limited to: Measurement error, nonresponse error; sampling error, cross-country or group- comparability. We would be particularly interested in methodological innovations on these topics, and in substantive applications demonstrating such innovations. | |

Survey effects in secondary analysis of pooled data | Mr Cristiano Vezzoni |

The growing availability of secondary data offers the possibility to tackle the same research question with different studies (Eurobarometer, ESS, EVS, EU-SILC, ISSP, etc.) that share similar operationalization and sample designs. Pooling different studies allows to increase the number of cases and expand the period of analysis, while controlling for data quality. Nonetheless, the pooling procedure introduces the potential for biases due to survey effects. Survey effects may depend - as known - on several factors (Glenn 2005, 43-50): data collection mode (face-to-face, telephone, self-administered); sampling designs; training and supervision of interviewers, coding procedures (these constitute the so-called house effect), question wording, question ordering and, more generally, the topics covered by the questionnaire (questionnaire context effect). In literature on substantive topics, different strategies to deal with survey effect are proposed, but a systematic reflection is lacking notwithstanding the relevance of the problem. The session welcomes papers that address substantive research questions using pooled surveys data from different studies, with a focus on strategies adopted to detect and deal with survey effects. | |

Surveys and compositional data | Dr Germa Coenders |

Statistical compositions consist of positive data arrays with a fixed sum and which only convey information on the relative importance of each component. The commonest examples are proportions of the set of components of a total, whose sum can only be 1. Composition indicators are frequent in surveys. Among others we find surveys measuring compositions of household budgets, time-use surveys, and compositional indicators used in social network surveys, usually expressed in percentages of family members, friends, neighbours, co-workers and others in a social network. Statistical analysis of compositional data is a challenging task because compositional data lie in a restricted space and components cannot vary independently from one another ("all other things constant"): the relative importance of one component can only increase if the relative importance of at least one other component decreases. A popular solution is to transform compositional data by means of logarithms of ratios of components prior to applying standard analysis methods, while taking great care in the interpretation of the results. Simpler statistical methods such as ANOVA, linear regression and cluster analysis have a well documented tradition in compositional data analysis in the fields of geology and biology, among others. Less has been done in the survey research field, regarding, for instance, measurement models or structural equation models on compositional data. In these fields, the naive analysis of raw proportions is of common practice even if it is plagued with statistical problems (inconsistent inferences, heteroskedasticity, non-normality, censoring, perfect collinearity, and unclear interpretation, among others). The session aims to bridge methodological knowledge between the natural and social sciences in order to narrow this gap. | |

The Trouble with Logit and Probit: Teaching and Presenting Nonlinear Probability Models | Dr Henning Best |

While researchers in the social sciences have used Logit and Probit routinely since the 1990s, some of the difficulties in using various types of nonlinear probability models have received increased attention in recent years only. At least three important methodological problems have been raised in the discussion: - The general interpretation of the coefficients is not as straightforward as in OLS - Coefficients cannot easily be compared between subgroups - Coefficients cannot easily be compared between nested models Some of the difficulties stem from what has come to be known as "neglected heterogeneity". There are interesting suggestions on how to cope with neglected heterogeneity mathematically, and on how to interpret the coefficients in a meaningful way. Yet, these suggestions still have to trickle down to teaching quantitative methods, especially in undergraduate courses on multivariate statistics. Additionally, standards on how to present nonlinear models in publications still have to be established. Is the tabular presentation of coefficients we all are used to from linear models equally appropriate for Logit and Probit? In this session we especially seek presentations on approaches to interpreting and presenting Logit and Probit results, as well as suggestions and experiences for teaching nonlinear models without neglecting these important problems. |