Monday 15th July 09:00 – 12:00
Caroline Roberts: The Essentials of Survey Methodology
Asaph Young Chun: Administrative Records for Survey Methodology
Alexandru Cernat: Survey Data Visualisation in R
Matthias Schonlau: Text Mining Applied to Open-ended Questions
Rebecca Kuiper & Herbert Hoijtink: Introduction to Bayesian Statistics
Monday 15th July 13:00 – 16:00
Vera Toepoel & Peter Lugtig: Smartphones: From Surveys to Sensors
Christian Bruch & Matthias Sand: Handling Missing Data in Sample Surveys
Fred Conrad, Michael Schober & Andrew Hupp: Text Message Surveys
Daniel Oberski: Integrating Machine Learning in Surveys
Oliver Lipps: The Mechanics of Longitudinal Data Analysis
The Essentials of Survey Methodology
Caroline Roberts, University of Lausanne, Switzerland
This short course provides an introduction to the design and implementation of quantitative social surveys, and different procedures for maximising the quality of survey data. It will consider the various steps involved in conducting a survey and the challenges that can arise along the way. In particular, it will address the major sources of ‘survey error’ that result from these challenges and their potential effects on the accuracy of the data collected. It will present ways of minimizing the impact of survey errors on data quality and of ensuring the validity of the research findings.
Participants will be introduced to key principles in survey methodology that relate to the quality and cost of survey estimates. A significant component of survey error arises from the design of questionnaires, the mode of questionnaire administration, and the way in which respondents answer questions in different modes. Much can be done to mitigate such errors at relatively low cost. For this reason, the course will focus particularly on these aspects of survey design and implementation, to help survey designers recognize and address threats to measurement quality before collecting data, and make analysts aware of potential problems when using survey data.
The course is suitable for people starting out in survey research, whether they are responsible for conducting surveys, analysing survey data or both.
At the end of the course, students should be able to:
a) Summarise the key sources of error that impact on the accuracy of survey estimates and procedures
typically used to minimise them;
b) Describe ways in which different survey design choices influence the risk of measurement error;
c) Follow best-practice guidelines on how to mitigate measurement error through effective
Caroline Roberts is Assistant Professor in Survey Methodology in the Institute of Social Sciences at the University of Lausanne, Switzerland, and an affiliated survey methodologist at FORS, the Swiss Centre of Expertise in the Social Sciences, where she collaborates closely with the survey teams on the design and implementation of methodological studies. Her research interests relate to the measurement and reduction of different types of survey error, particularly in mixed mode and mixed device web surveys, and to ways of improving the measurement of social attitudes. She is Programme Director of the MA in Public Opinion and Survey Methodology offered by the Universities of Lausanne, Neuchâtel, and Lucerne, for which she teaches courses on survey research methods, questionnaire design and public opinion formation. She has a PhD in Social Psychology from the London School of Economics and Political Science, and has worked in the coordinating teams responsible for the design and implementation of a number of large-scale surveys, at the UK’s Office for National Statistics, for the European Social Survey at City University, London, and the American National Election Studies at Stanford University.
Administrative Records for Survey Methodology
Asaph Young Chun, Statistics Research Institute, Statistics Korea
Administrative records have long been regarded as a way of supplementing and improving the quality and interpretability of surveys and censuses. Using administrative records helps control the rising cost of survey data collection and improve data quality. In this workshop, I will address fundamental issues of understanding administrative data and integrating them with sample surveys and censuses to provide information essential to decision-making. The workshop is intended to provide a review of foundation and the current state of the field in administrative records research and applications by leveraging a framework of “total administrative records error” and “total linked data error”. As such, the workshop will provide an overview of methodological issues involved in the collection and analysis of administrative records, such as data quality, cost, and data linking methodology. The workshop will present best practices and cutting-edge research on the use of administrative records over the survey life cycle, integrating with surveys, censuses, and auxiliary data. Best practices are assembled across the Atlantic. Real-world examples of linking administrative records to household, economic and health surveys will address how administrative data can be used to improve the survey frame, reduce nonresponse errors, and assess coverage error, among others. Other examples will illustrate how administrative data can be transformed into information that is useful and relevant to evidence-based decision making in key sectors of health, economy, and education so they may answer questions that cannot be answered by relying on traditional surveys alone.
The target audience ranges from users and producers of administrative records to researchers in academia, government, and industry. The workshop will be equally useful to students of data science, big data, and survey methodology. The workshop should be of interest and benefit to practitioners of survey methodology, given the escalating use of administrative records in almost every discipline. I will use “Administrative Records for Survey Methodology” (Forthcoming 2019) as a primary reference for this workshop. I am an editor-in-chief of this forthcoming cutting-edge book published by Wiley. The book featuring leading scholars of administrative data across the Atlantic addresses the following topics: theory and fundamentals of administrative data research and application, data quality of administrative data, administrative data processing and linking methodology, statistical use of administrative data in sample surveys and censuses, and real-world applications of administrative data to inform evidence-based policy making.
Dr. Asaph Young Chun is a seasoned research director and senior survey methodologist dedicated to innovations, complex survey methodology, and statistics for 28 years. He worked as Research Chief for Decennial Directorate at the U.S. Census Bureau and Senior Survey Methodologist at NORC at the University of Chicago, leading large-scale administrative records research funded by the U.S. government. He currently works as Director of the Statistical Research Institute – the leading think tank of Statistics Korea, Associate Editor of Journal of Official Statistics, Program Chair of the American Statistical Association Survey Research Methods Section, and Faculty Chair/Director of the PSI Summer Institute for Data Science, Survey Methodology and Interdisciplinary Research. His areas of research and teaching are devoted to administrative records, responsive and adaptive design, nonresponse, and evidence-based decision making. Young has spearheaded administrative data research collaboration since 2010 across the Atlantic, among researchers from academia, government and industry. A forthcoming book, “Administrative Records for Survey Methodology” published by Wiley, resulted from such a synergy that he fostered as an editor-in-chief.
Survey Data Visualisation in R
Alexandru Cernat, University of Manchester, UK
Being able to present data visually is an essential skill that is relevant in all the stages of data analysis, from exploration, to model evaluation and presenting results to a general audience. In this course you will learn the basics of visualization using R. This is an open source statistical software that has a flexible and expandable visualization capabilities based on the grammar of graphics.
In this course we will cover the basics of visualization, how to do different types of graphics in R and how to consider some of the specific issues relevant to survey research, such as making graphics using complex sample designs. Finally we will discuss exciting extensions such as maps, interactive graphs, and animated figures.
While not mandatory it is recommended that you have a basic knowledge of R before joining the class to facilitate comprehension. During class you will also be provided with the syntax needed to produce the graphs you will see in class.
Dr. Alexandru Cernat is a lecturer in Social Statistics at the University of Manchester. Previously he was a Research Associate at the National Centre for Research Methods, University of Manchester where he investigated non-response in longitudinal studies with a special focus on biomarker data. He has received a PhD in survey methodology from the University of Essex working on the topic of mixed mode designs in longitudinal studies. His research interests are in: latent variable modelling, longitudinal data, measurement error, missing data and new forms of data.
More details at: www.alexcernat.com
Text Mining Applied to Open-ended Questions
Matthias Schonlau, University of Waterloo, Canada
Text data from open-ended questions in surveys are difficult to analyze and are frequently ignored. Yet open-ended questions are important because they do not constrain respondents’ answer choices. Where open-ended questions are necessary, sometimes multiple human coders hand-code answers into one of several categories. At the same time, computer scientists have made impressive advances in text mining that may allow automation of such coding.
The purpose of this short course is to introduce participants to (a) how text can be converted into numerical ngram variables and (b) to run a statistical learning algorithm on such ngram variables. An n-gram is a contiguous sequence of words in a text. A new Stata program will be introduced, ngram, and made available to participants. The program supports a large number of European languages: Danish, German, English, Spanish, French, Italian, Dutch, Norwegian, Portuguese, Romanian, Russian, and Swedish. Broadly speaking, ngram creates hundreds or thousands of variables each recording how often the corresponding n-gram occurs in a given text. This is more useful than it sounds.
We will use the ngram variables to categorize open-ended questions using a supervised learning algorithm (e.g. support vector machines). Examples will be given using Stata.
(1) explain the bag-of-words/ ngram approach to text mining,
(2) apply the bag-of-words / ngram approach in Stata,
(3) apply a supervised learning method to the categorization of open-ended questions in Stata.
Matthias Schonlau is a Professor in the Department of Statistics and Actuarial Science at the University of Waterloo, Canada. Prior to his academic career, he spent 14 years at the RAND Corporation (USA), the Max Planck Institute for Human Development in Berlin (Germany), the German Institute for Economic Analysis (DIW), National Institute of Statistical Sciences (USA), and AT&T Labs Research (USA).
Dr. Schonlau’s current research focuses on applying statistical/machine learning algorithms to open-ended questions. He is a board member of the European Survey Research Association. He is the lead author of the book “Conducting Research Surveys via E-Mail and the Web”. Dr. Schonlau has published more than 60 peer-reviewed articles.
Introduction to Bayesian Statistics
Herbert Hoijtink, Utrecht University, the Netherlands
Rebecca Kuiper, Utrecht Universiy, the Netherlands
This short course will, first of all, give an introduction to Bayesian estimation based on Gelman et al. (2013, Chapter 2.1, 2.2, 2.3, 2.4). Using a binomial model the density of the data, the prior distribution, and the posterior distribution will be introduced. Two elaborations will be presented: (1) Bayesian estimates of the parameters of interest (in the example binomial model the probability of success in a control and experimental group and their ratio) are often based on a sample from the posterior distribution; and (2) prior knowledge in the form of historical data can be included in the posterior distribution using a so-called power-prior (Rietbergen et al., 2011). If time permits, it will also be shown how, for the example at hand, Bayesian estimation can be executed using JAGS http://mcmc-jags.sourceforge.net/
Secondly, an introduction to Bayesian hypothesis evaluation using the Bayes factor will be given based on Hoijtink et al. (2019). Using a simple three group one-way ANOVA It will be elaborated: (1) that in addition to the classical null and alternative hypotheses, also, so-called, informative hypotheses may be of interest to researchers (e.g., m1 > m2 > m3, where the m’s denote the population means of three different groups); (2) what the Bayes factor is and how it can be used to evaluate these hypotheses; and (3) what posterior model probabilities are, how they can be interpreted as Bayesian error probabilities, and in what way the Bayesian errors differ from the classical Type I and Type II errors. If time permits, it will also be shown how, for the examples given, Bayesian hypothesis evaluation can be executed using bain https://informative-hypotheses.sites.uu.nl/software/bain/
Rebecca Kuiper is an assistant professor at the Faculty of Social and Behavioral Sciences of Utrecht University in the Netherlands. Based on her publications in journals like Biometrika and Psychological Methods, she received a grant from the Netherlands Organization for Scientific Research to build a research group with which she can continue her research with respect to dynamical modelling, informative hypotheses, and research synthesis.
Herbert Hoijtink is professor in applied Bayesian statistics at the Faculty of Social and Behavioral Sciences of Utrecht University in the Netherlands. For the past six years he has been associate editor of Psychological Methods, one of the leading journals in the area of applied methodology. Using a large grant from the Netherlands Organization for Scientific Research in 2005 he was able to develop the evaluation of informative hypotheses with a group of eight researchers. Currently he is involved as a methodologist in a nationwide consortium of twenty researchers (funded by the Netherlands Organization for Scientific Research in 2012) that is tracking the development of children into adolescents.
Smartphones: From Surveys to Sensors
Peter Lugtig, Utrecht University, the Netherlands
Vera Toepoel, Utrecht University, the Netherlands
In many countries, and for people of most ages, the smartphones forms an integral part of life. In many countries, smartphones are replacing traditional PCs and laptops as the primary device to browse the Internet, and use social media. In the last couple of years, researchers have experimented with smartphones as a method of data collection. This short course focuses on recent studies that have aimed to study how smartphones can be used. 1. As a device to administer surveys and 2. To acquire additional behavioral data using sensors. In particular we will discuss:
- why you should want to do research using smartphones
- how web questionnaires should be adapted to become smartphone-friendly, and whether you should worry about device effects
- ways in which data can be collected on a smartphones: browsers and apps
- issues related to willingness and consent to participate in smartphone studies that collect behavioral data
- how such behavioral data can potentially be used to enrich survey data, using GPS locations as an example.
It is helpful if participants to the short course bring a smartphone with them, as well as a laptop. We will not do any data-analysis during the short course, but will provide a dataset with GPS location data collected using smartphones. We will use this dataset to illustrate how behavioral data can complement survey data, and outline a research agenda that focuses on combining survey and sensor during and after data collection.
Vera Toepoel is an assistant professor in survey methodology. Her research interest lie in everything related to survey methodology and online surveys in particular: from recruiting respondents, designing the survey instrument, correcting for bias etc. Current topics include data chunking (a.k.a. modular survey design), sensor data (and consent) and mobile survey design. Vera wrote her dissertation on Designing Web Questionnaires at Tilburg University (granted a thesis award from the General Online Research Association). She has worked with and built several online research panels. Vera received a VENI from the Dutch Scientific Organization for a 3-year research project in non-probability samples.
Vera is the current president for RC33 (Methods and Logistics) from the International Sociological Association and member of the coordinating team of the Dutch Platform for Survey Research (www.npso.net). She is a member of the Scientific Quality Assurance Board of the GESIS Online Panel. Vera is the author of the book “Doing Surveys Online” published by Sage (2016), has authored several chapters in handbooks for methodology, and has published numerous journal papers amongst others in Public Opinion Quarterly, Sociological Methods and Research, Survey Research Methods, Social Science Computer Review.
Peter Lugtig in an associate professor in survey methodology. His research interest lie in the interplay of three areas: 1. Doing survey-research on mobile devices 2. Combining sensor and survey data and 3. The statistical estimation of data quality in surveys. Peter finished a MsC in political science at the University of Amsterdam, before moving into research methodology as a field of research. He completed his Ph.D. in 2012 at Utrecht University with a study into nonresponse and measurement errors in panel surveys. He received a Future Leaders Grant in 2012 from the UK Economic and Research Council for a 3-year research project into the trade-off between nonresponse and measurement errors in panel surveys. Peter is a member of the consortium board of the Gender and Generations Programme (www.ggp-i.org), member of the methodological advisory board of the Understanding Society study (www.understandingsociety.ac.uk), and member of the coordinating team of the Dutch Platform for Survey Research (www.npso.net).
Handling Missing Data in Sample Surveys
Christian Bruch, GESIS – the Leibniz Institute for the Social Sciences, Germany
Matthias Sand, GESIS – the Leibniz Institute for the Social Sciences, Germany
This course will cover two reoccurring problems with estimation based on sample surveys, missing data on unit- and item-level, how to deal with them and the impact such methods may have on estimates and their variance. Hence, this course is divided in two sections.
First, missing data and their mechanisms on unit-level will be explored, followed by an overview of weighting procedures such as raking, poststratification and linear methods (e.g. the General Regression Estimator). These methods will then be evaluated by measures of accuracy (bias) and precision (variance).
The second part focuses on methods to cope with missingness at an item level. In this context, imputation methods are on focus. We will discuss various imputation methods. It will be differentiated between single and multiple imputation methods and their implications.
In regard of evaluating the methods in both parts, their impact on the estimation will then be described using adequate data sets. The methods to compensate for nonresponse will be compared to a complete case analysis or listwise deletion of cases with missing values. In doing so, effects of ignoring nonresponse will be shown.
Although the course aims to give a broad overview of various procedures and not on giving an introduction on how to implement such methods for a particular survey, the examples will be presented in R. However, participants are not required to have any existing knowledge of R.
Matthias Sand is a postdoctoral researcher at GESIS, the Leibniz Institute for the Social Sciences. He is currently responsible for the harmonization efforts for sampling and weighting procedures that are undertaken within GESIS’ surveys. Since 2013 he has also been responsible for the up keeping of the GESIS sampling frame for telephone surveys. In his doctoral thesis, he dealt with the improvement of weighting procedures in multiple-frame. He furthermore consults on various questions regarding sampling, weighting, estimation and imputation. His research interests include sampling and weighting procedures with multiple frames.
Christian Bruch is also postdoctoral researcher at GESIS. His consultancy focuses on complex sample surveys, variance estimation and Imputation. Prior to this, he worked at the German Internet Panel and at the Mannheim Centre for European Social Research at the University of Mannheim conducted research on imputations and weighting. His PhD thesis (Trier University, 2016) examines variance estimation under imputation and for complex sampling designs. Furthermore, he worked in a project comparing single and multiple imputation methods for the German Census.
Text Messaging for Conducting Survey Interviews
Fred Conrad, University of Michigan, USA
Andrew Hupp, University of Michigan, USA
Michael Schober, The New School, USA
Text messaging, as a survey mode, is an emerging option for researchers. This short course presents recent findings and emerging practices about inviting participants (whether to complete a text interview or an interview/self-administered questionnaire in another mode), asking survey questions, and collecting answers via text messages. Text messaging has particular qualities that distinguish it from other modes of data collection and that provide particular advantages for respondents and researchers but also new challenges. The short course first focuses on experimental evidence on data quality and the nature of the interaction in text messaging interviews, as well as on efficiency of texting: the number attempts required to contact sample members, the amount of time required to complete the sample, the possibility of conducting multiple text interviews simultaneously, and the benefits of automated vs. human-administered texting. The course then focuses on practical aspects of implementing text messaging in the survey process, including designing for respondents whose mobile phones are not smartphones or whose network connections are not ideal, whether to allow free-text responses or only single-character responses, and how many questions can realistically be asked via text message. Finally, we discuss regulation and privacy concerns (e.g., compliance with the GDPR and the US Telephone Consumer Protection Act [TCPA]).
1. Familiarity with current evidence and emerging opportunities for using text messaging in social and behavioral research.
2. Understanding how features of the mode promote particular kinds of interactions and responses.
3. Awareness of design considerations and options for administering text message surveys.
Fred Conrad is a Research Professor in the Institute for Social Research at the University of Michigan, where he directs the Program in Survey Methodology. He is also Professor of Psychology at the University of Michigan. His research generally concerns the application of ideas and methods from cognitive science and human-computer interaction to the reduction of survey measurement error, especially in new modes of data collection: text message interviews, video-mediated interviews, and virtual interviews. He co-authored The Science of Web Surveys (2013) with Roger Tourangeau and Mick Couper and co-edited Envisioning the Survey Interview of the Future (2008) with Michael Schober with whom he won AAPOR’s Mitofsky Innovators award in 2013. He has a PhD from the University of Chicago.
Andrew Hupp is a Survey Specialist Senior in the Survey Research Center at the University of Michigan. His research interests include designing and implementing systems for mixed mode data collections, incorporating newer modes, such as text messaging and video-mediation, understanding the impact these modes have on the quality of data collected, web surveys, paradata, as well as geographic information systems. He has co-authored two recent journal articles and a book chapter on the use of text messaging to conduct survey interviews. He has an MSc in Geographic Information Systems from the University of Leeds.
Michael Schober is Professor of Psychology and Vice Provost for Research at The New School in New York City. His academic training is in cognitive psychology (Ph.D., Stanford University, 1990) and cognitive science (Sc.B., Brown University, 1986). His survey methodology research, much of it in collaboration with Frederick G. Conrad, examines interviewer-respondent interaction, respondent comprehension of terms in survey questions, and how existing communication modes not yet widely used for survey data collection (text messaging, video-mediated interviewing, interviewing by virtual animated agents) and more dialogue-like versions of existing modes (web surveys, spoken language systems) might affect data quality. His survey methods research connects with his studies of how people coordinate their actions and understand each other in other kinds of dialogue and in collaborative music-making (jazz, classical chamber music), the mental processes underlying that coordination, and how new technologies mediate coordination. Together with Fred Conrad, he co-edited the volume Envisioning the Survey Interview of the Future (Wiley, 2008), and they were awarded the 2013 Warren J. Mitofsky Innovators Award from the American Association for Public Opinion Research.
Integrating Machine Learning in Surveys
Daniel Oberski, Utrecht University, the Netherlands
Machine learning occurs whenever a computer program uses experience to improve its performance on some task (Mitchell 1997). In practice, machine learning often amounts to statistical estimation – but with more emphasis on prediction and less on inference – and with an impressive array of techniques that can deal with complex functions and large data structures (Goodfellow et al. 2016).
In this short course, we’ll take a look at what some of these techniques are and how they can be useful in the kind of tasks we encounter in survey research and methodology. We will get started on applying common machine learning techniques in R or Python. And we’ll discuss some examples where machine learning can help us deal with problems like nonresponse propensity estimation, predicting “human values” from Facebook “likes”, and (semi)automatic coding of open questions.
Daniel Oberski is an associate professor at the Department of Methodology & Statistics, Utrecht University, The Netherlands. He coordinates the university-wide Special Interest group Machine Learning within the university’s focus area Applied Data Science. He also currently supervises several projects in which machine learning is applied to a diverse range of problems across the sciences, including text mining doctors’ notes, analyzing high-dimensional gene expression data, discovering new materials experiments to run, automatically coding Google Street View images, and semi-automatizing systematic literature reviews.
The Mechanics of Longitudinal Data Analysis
Oliver Lipps, FORS – Swiss Centre of Expertise in the Social Sciences, Switzerland
I will first give a very brief refresher of linear regression and will then introduce panel data and the structure necessary to conduct longitudinal analysis before we study the concept of causality based on the counterfactual approach. Then I will explain the idea of fixed effect (FE) models using a small N example. We will start modeling a (pooled) OLS model, then control for a confounding time-constant variable in order to reduce omitted variable bias and finally a FE model in order to eliminate bias from any omitted time-constant variable. I will explain in each step how we come closer to an unbiased regression coefficient.
After introducing the random effect (RE) estimator in the next step, I will then give some (typical) examples of pooled OLS, FE, and RE estimators using data from a large N panel survey. We will finally discuss the Hausman test as a formal tool to decide whether RE models or FE models must be used.
The core of the course is to understand how the FE model works “mechanically”, i.e. how can within-individual transformations be visualized using a small-N example. For the small-N application examples I will provide the corresponding Stata syntax. In addition, participants will learn more about the pros and cons of different longitudinal linear models, including the FE model, the RE model, and the pooled OLS model. Some familiarity with panel data and regression models is assumed.
I am head of the methodological research programme at FORS and member of the Swiss Household Panel team. In addition, I am lecturer in survey methodology and survey research at the Institute of Sociology at the University of Bern (Switzerland). My research interests focus on unit and item nonresponse in cross-sectional and especially longitudinal designs, effects due to interviewers, incentives, mode, and language proficiency/acculturation issues, and income imputation methods.