Please note that only short courses with at least 10 participants will take place.
Monday 17th July 10:00 – 13:00
Caroline Roberts: The Essentials of Survey Methodology
Tomislav Pavlović: Data Cleaning in R: A Step-by-Step Guide to Creating Tidy Data Bases
Angelo Moretti: Introduction to Applied Small Area Estimation: Methods and Applications in R
Oriol Bosch-Jover: Measuring Citizen’s Digital Behaviours Using Web Trackers and Data Donations
Monday 17th July 14:00 – 17:00
Barbara Felderer: Nonresponse Bias
Jörg Blasius: Assessing the Quality of Survey Data
Alexandru Cernat: Survey Data Visualisation in R
Oliver Lipps: The Mechanics of Longitudinal Data Analysis
Bella Struminskaya & Peter Lugtig: Smart Surveys: Integrating Survey Data and Big Data
Mario Callegaro, Vlad Achimescu & Stephen Wheeler: How Online Survey Paradata Collection and Analysis Can Improve Your Study
The Essentials of Survey Methodology
Caroline Roberts, University of Lausanne, Switzerland
This short course provides an introduction to the design and implementation of quantitative social surveys, and different procedures for maximising the quality of survey data. It will consider the various steps involved in conducting a survey and the challenges that can arise along the way. In particular, it will address the major sources of ‘survey error’ that result from these challenges and their potential effects on the accuracy of the data collected. It will present ways of minimizing the impact of survey errors on data quality and of ensuring the validity of the research findings.
Participants will be introduced to key principles in survey methodology that relate to the quality and cost of survey estimates. A significant component of survey error arises from the design of questionnaires, the mode of questionnaire administration, and the way in which respondents answer questions in different modes. Much can be done to mitigate such errors at relatively low cost. For this reason, the course will focus particularly on these aspects of survey design and implementation, to help survey designers recognize and address threats to measurement quality before collecting data, and make analysts aware of potential problems when using survey data.
The course is suitable for people starting out in survey research, whether they are responsible for conducting surveys, analysing survey data or both.
At the end of the course, students should be able to:
a) Summarise the key sources of error that impact on the accuracy of survey estimates and procedures typically used to minimise them;
b) Describe ways in which different survey design choices influence the risk of measurement error;
c) Follow best-practice guidelines on how to mitigate measurement error through effective questionnaire design.
Dr Caroline Roberts is a senior lecturer in survey methodology and quantitative research methods in the Institute of Social Sciences at the University of Lausanne (UNIL, Switzerland), and an affiliated survey methodologist at FORS – the Swiss Centre of Expertise in the Social Sciences. At UNIL, she teaches courses on survey research methods, questionnaire design, public opinion formation and quantitative methods for the measurement of social attitudes. She has taught a number of summer school and short courses on survey methods, questionnaire design, survey nonresponse, mixed mode surveys, and data literacy. At FORS, she conducts methodological research in collaboration with the teams responsible for carrying out large-scale academic surveys in Switzerland, including the ESS, the ISSP, the EVS, the Swiss Election Studies and the Swiss Household Panel Survey. Her research interests relate to the measurement and reduction of nonresponse bias and other types of survey error – most recently in the context of surveys on smartphones. Caroline is currently Chair of the Methods Advisory Board of the European Social Survey and was President of the European Survey Research Association from 2019-2021 and Conference Chair from 2017-2019.
Data Cleaning in R: A Step-by-Step Guide to Creating Tidy Data Bases
Tomislav Pavlović, Institute of Social Sciences Ivo Pilar, Croatia
In this brief course, participants will learn how to deal with untidy databases in a replicable and transparent manner. Using multiple examples, participants will gain insight into the main principles of cleaning data that can dramatically speed up the process, as well as the consequences of their careless use. After completing the course, participants will be able to deal with the majority of common complications in raw databases elegantly (e.g., the wrong type of input, merged columns, errors in specific rows, caps lock) and prepare their data for further analyses or publication. Next to observing the codes in action, participants will get a short “workbook” – an incomplete script allowing them to test their knowledge, with tasks ranging from beginner to advanced level. Proposed solutions to these tasks will be provided, as well. All the codes and course materials would be provided to participants.
After finishing this course, participants will be able to:
- Recognize the tools used for data cleaning in R
- Recognize basic data types, structures, and formats in R
- Prepare databases in an easy-to-use and publishable format using these tools
- Split and merge data sets
- Modify variable and value labels
- Export database in various formats
Dr Tomislav Pavlović is a postdoc researcher at the Institute of Social Sciences Ivo Pilar in Croatia. His primary focus is on political psychology. He worked on multiple international projects (CHIEF, DARE, ICSMP), which gave him the opportunity to refine his skills in data cleaning, analyses, and visualizations.
Introduction to Applied Small Area Estimation: Methods and Applications in R
Angelo Moretti, Utrecht University, The Netherlands
Large-scale sample surveys are not designed to produce reliable estimates for small population domains, e.g., geographical areas or population groups. Therefore, small area estimation methods, that borrow strength information from auxiliary data e.g., the Census or administrative data, can be used to produce reliable estimates. This course covers basic small area estimation methods based on the direct and model-based estimation approach and it is structured in three parts. The first part is about the introduction to the small area estimation problem and the use of direct estimators to produce small area estimates. In the second part, we introduce the unit-level approach based on the Battese, Harter and Fuller model, assuming that auxiliary information is available at unit-level. The third part is on the area-level approach, based on the Fay-Herriot model. This approach is useful when the auxiliary information is available are area-level only. Practical applications and examples in R are presented in each part using some common R packages available to users.
By the end of the course participants will be able to:
- Understand the small area estimation problem
- Understand which techniques are most commonly used (and why)
- Be able to apply and validate two of the most diffused small area estimation methods in Official Statistics based on the area-level and unit-level approach
- Implement the methods in R software.
During the course participants will be provided with the R programs and datasets needed to produce the analysis presented in the course. The intention of this course is providing useful applicable tools for researchers and practitioners. It will be a mixture of different activities, i.e., method discussions, practical examples and software applications.
- Introductory knowledge of linear models is assumed.
- Regarding small area estimation, none, since it is beginner level.
- Ideally, participants have a working installation of R and RStudio.
Dr Angelo Moretti is an assistant professor in Statistics at Utrecht University, holding a PhD in Social Statistics from the University of Manchester. He is an elected member of the International Statistical Institute, fellow of the Royal Statistical Society, International Association of Survey Statisticians and Italian Statistical Society. His research interests lie in survey statistics, in particular small area estimation of social indicators under different approaches, data integration and survey calibration. He also works on composite social indicators and data dimensionality reduction problems in small area estimation. In addition, his research focuses on applications in different domains such as, wellbeing, poverty and crime.
Measuring Citizen’s Digital Behaviours Using Web Trackers and Data Donations
Oriol Bosch-Jover, London School of Economics, UK
The expansion of the Internet, together with the capabilities of modern connected devices, result in a plethora of data that promise fascinating opportunities to understand individuals’ digital behaviours. This course will teach students how to measure digital behaviours using web trackers and data donations, and how to combine these approaches with online surveys. The course has the following learning objectives:
- Develop an understanding of what web tracking data and data donations are.
- Learn how web tracking data and data donations can be collected and analysed, and how they can be combined with surveys.
- Recognise the challenges and errors that might arise in every step of the process of collecting and analysing both data sources.
- Develop best practices when using this type of data, specifically, strategies to quantify, minimise and report potential errors.
- Evaluate the limits of their own and others’ web tracking and data donation collection strategies.
To aid students achieve the above learning outcomes, the course will have two interactive activities:
- Using the unique TRI-POL open access datasets, a cross-national longitudinal survey combined with web tracking data, students will familiarise with a web tracking dataset. Likewise, students will learn how to use computational methods such as Monte Carlo simulations and machine learning to quantify the data quality of digital trace data.
- Students will get hand-on experience about a specific type of data donation: screenshots and video recordings of the Digital Wellbeing / Screen Time from Android and iOS, which provide information of the time spent on apps and webs from individuals’ devices. They will learn how to automatize the extraction of information from those screenshots. Specifically, students will learn how to run an R script that sends images to Google Vision API, extracts the text from the images, and creates a workable structured dataset.
This short course can be combined with the afternoon short course “Smart surveys: Integrating survey data and big data”.
Oriol Bosch-Jover is a fourth year PhD candidate at the Department of Methodology, at the London School of Economics, and Non-Resident Researcher at the Research and Expertise Centre for Survey Methodology (RECSM). As a methodologist, Oriol focuses on understanding how to better collect and analyse attitudinal and behavioural data for the social sciences. He specializes in topics related to web and mobile surveys and the use of digital trace data and sensors to enhance or substitute surveys. His work, published in journals such as Social Science Computer Review or the Journal of the Royal Statistical Society, has explored the measurement quality of survey scales in online surveys using MTMM experiments; the generational divides between participants in terms of survey behaviour and data quality; and the impact on data quality of using novel data types to answer survey questions such as visual and voice data. Oriol is currently focusing on understanding how social scientists can best collect information about citizens’ online behaviours using data donations and web trackers, e.g., apps that can track the URLs and apps through that individuals visit. Through a combination of theory and traditional survey and computational methods, his research explores how to quantify and minimize digital trace data errors, while comparing them with the ones of surveys.
Barbara Felderer, GESIS – Leibniz Institute for the Social Sciences, Germany
The short course discusses the emergence and analysis of nonresponse bias. The first part of the course discusses the relationship between nonresponse and nonresponse bias. In the second part, frequently used indicators of nonresponse bias will be introduced and their usefulness and limitations discussed. In the practical part of the workshop, indicators for nonresponse bias will be calculated. A synthetic dataset will be provided, but participants are welcome to bring their own datasets to conduct nonresponse bias analysis. The course is suitable for beginners and advanced researchers.
Dr Barbara Felderer is the head of the survey statistics team at GESIS – Leibniz Institute for the Social Sciences. The first focus of her research is survey methodology, especially nonresponse and nonresponse bias. The second focus is (survey) statistics, currently especially causal machine learning methods and their application to improve survey quality.
Assessing the Quality of Survey Data
Jörg Blasius, University of Bonn, Germany
Survey data is plagued with non-substantive variation arising from myriad sources such as response styles, socially desirable responding, and (partly) faked interviews. Applying principal component analysis, categorical principal component analysis and multiple correspondence analysis, I show various strategies for assessing the quality of the data, i.e., for detecting non-substantive sources of variation.
The workshop focuses on screening procedures that should be done prior to assessing substantive relationships. Screening survey data means searching for variation in observed responses that do not correspond with actual differences between respondents. It also means the reverse: isolating identical response patterns that are not due to respondents holding identical viewpoints. This can be a sign of faked and duplicated interviews.
In the workshop I will demonstrate a variety of data screening processes that reveal distinctly different sources of poor data quality. Using well-known data sets such as the ISSP, PISA, PIAAC and the ESS, I will provide examples for how to detect non-substantive variation that is produced by response styles such as acquiescence, extreme response styles, mid-point responding and stereo-type responses; misunderstanding of questions due to poor item construction; faked and partly faked interviews by interviewers and employees of survey research organisations.
- Strong satisficing of respondents, resulting in response styles such as acquiescence, extreme response styles, mid-point responding, straight-lining and stereo-type responses;
- Faked and partly faked interviews by interviewers and employees of survey research organisations;
- Different field work standards in cross-national surveys;
- Misunderstanding of questions due to poor item construction;
- Heterogeneous understanding of questions arising from cultural differences.
Closing discussion (approx. 30 minutes) on strategies how to detect partly faked interviews during the fieldwork, differentiate between satisficing the respondents and the interviewer, identifying fabricates by employees of the institutes.
Professor Jörg Blasius is Professor of Sociology at the Institute for Political Science and Sociology, University of Bonn, Germany. His research interests are mainly focused on explorative data analysis, especially correspondence analysis and related methods, data collection methods, sociology of lifestyles, and urban sociology. He gave numerous courses on correspondence analysis and related methods, in addition to several courses held at the Summer Schools in Essex, Lugano, Cologne (spring seminar), and St. Gallen, he gave courses in various countries, for instance in Switzerland, Italy, Norway, Portugal, Costa Rica, and the USA.
Survey Data Visualisation in R
Alexandru Cernat, University of Manchester, UK
Being able to present data visually is an essential skill that is relevant in all the stages of data analysis, from exploration, to model evaluation and presenting results to a general audience. In this course you will learn the basics of visualization using R. This is an open source statistical software that has flexible and expandable visualization capabilities based on the grammar of graphics. In this course we will cover the basics of visualization, how to do different types of graphics in R and how to consider some of the specific issues relevant to survey research, such as making graphics using complex sample designs. Finally, we will discuss exciting extensions such as maps, interactive graphs, and animated figures. Learning objectives – Understand the concept of grammar of graphics – Understand how this is implemented in R – Learn how to visualize observed data as well as summary statistics – Learn how to visualize statistics corrected for complex survey designs Activities The main part of the course will be a lecture followed by a short practical that will include running graphs in R and then discussing the solution as a group.
Dr Alexandru Cernat is an associate professor in Social Statistics at the University of Manchester. Previously he was a Research Associate at the National Centre for Research Methods where he investigated non-response in longitudinal studies with a special focus on biomarker data. He has received a PhD in survey methodology from the University of Essex working on the topic of mixed mode designs in longitudinal studies. His research focuses on data quality in surveys and new forms of data.
The Mechanics of Longitudinal Data Analysis
Oliver Lipps, FORS – Swiss Centre of Expertise in the Social Sciences, Switzerland
I will first give a brief refresher of linear regression and discuss the assumptions to estimate unbiased regression coefficients. Then I will motivate the concept of causality based on the counterfactual approach and show the effect of control variables on errors, using a small-N example.
In the main part, I will introduce panel data and explain the idea of longitudinal models whose biggest advantage is that they eliminate bias from omitted time-constant variables. We again use a small-N example to better understand the basic concepts of these models. We will start modeling a (pooled) OLS model using the original variables, then show the effect of controlling for a confounding time-constant variable. We will then introduce fixed effects (FE) and first difference (FD) models. I will explain in each step how we come closer to an unbiased regression coefficient. In particular, we will graphically show the (FE and FD) transformation of the original variables before the transformed dependent variable is regressed on the transformed independent variable.
After introducing the random effect (RE) estimator as an intermediate model between OLS and FE, I will then give some (typical) examples of pooled OLS, FE, and RE estimators using data from a large-N panel survey.
The core of the course is to understand how the FE and the FD models work “mechanically”, i.e., how can within-individual transformations be visualized using a small-N example. I will use a minimum of formulas and focus on the understanding of these mechanisms. Participants will learn more about the pros and cons of different longitudinal linear models, including the FE model, the FD model, the RE model, and the pooled OLS model. Some familiarity with panel data and regression models is assumed.
Dr Oliver Lipps is the head of the methodological research programme at FORS – Swiss Centre of Expertise in the Social Sciences and member of the Swiss Household Panel team. In addition, I am lecturer in survey methodology and survey research at the Institute of Sociology at the University of Bern (Switzerland). My research interests focus on methods to improve (social science) survey data quality such as nonresponse or measurement issues, quantitative methods of data analysis, and substantive survey research, with a special interest in panel data.
Smart Surveys: Integrating Survey Data and Big Data
Bella Struminskaya, Utrecht University, The Netherlands
Peter Lugtig, Utrecht University, The Netherlands
Traditional surveys are not well-equipped to measure certain concepts of interest such as expenditures, time use or travel behavior due to the high burden placed on participants.
Facts or behaviors that are difficult to measure through self-report can be measured using new technologies: smartphone apps, sensors, and wearables. For example, accelerometers in smartphones and fitness bracelets can objectively measure physical activity, screen time apps can measure (social) media use. Another possibility is to augment surveys with administrative data or data from digital platforms such as Google, Youtube, Instagram, etc. that participants can provide to researchers through data donation, or consent to link.
Combining probability-based surveys with sensor or external data is called smart surveys. One of the key issues in smart surveys is how to design them. Should data be integrated during the study (e.g., giving respondents feedback on their behaviors), or only after the study has been completed? There are also several more practical problems in designing smart surveys (e.g., whether to use interviewers for recruitment, to loan devices, how to provide feedback to respondents without it leading to measurement reactivity).
In this short course we review current best practices of augmenting survey data with additional data sources, using examples from our own work in official statistics and/or social science research. After attending this course, participants will:
- Be able to identify potential sources of augmentation for their (survey) data;
- Be able to anticipate methodological and practical issues of combining survey data with additional data;
- Be able to assess data quality of data combined from multiple sources.
The course involves a few practical exercises (in R) and is suitable for people with pre-existing knowledge about survey methodology, and basic/intermediate knowledge of statistics. This short course can be combined with the morning short course “Measuring citizen’s digital behaviours using web trackers and data donations”.
Dr Bella Struminskaya is an assistant professor of Methods and Statistics at Utrecht University and an affiliated researcher at Statistics Netherlands. Her research focuses on the design and implementation of online, mixed-mode and smartphone surveys, and passive data collection. She has published on augmenting surveys with mobile apps and sensors, data quality, nonresponse and measurement error, including panel conditioning, and device effects. Bella is a board member of the German Society for Online Research where she co-organizes the General Online Research (GOR) conference and a member of AAPOR Online Education Committee.
Dr Peter Lugtig is an associate professor of Methods and Statistics at Utrecht University and an affiliated researcher at Statistics Netherlands. He specializes in modern survey methodology, which includes inferences using a mix of survey data and big data and the modelling of survey errors, and the use of sensor technology in smartphones to improve measurements. He has published widely on web and mobile surveys, nonresponse, and using combining sensor data with survey data.
How Online Survey Paradata Collection and Analysis Can Improve Your Study
Mario Callegaro, Google, UK
Vlad Achimescu, Google, Switzerland
Stephen Wheeler, YouTube, Switzerland
In online surveys, we have little insights on how respondents fill out a questionnaire, given it is a self-administered mode. Paradata, more recently also called survey logs in the literature, can shed light on how the respondent is filling out a survey, tell us about a difficult or confusing question, or help identify dubious answers.
In this short course we will show what kind of paradata can be collected on an online survey and how they have been used in the available literature to improve survey quality.
The class will cover the following topics:
- Paradata at each stage of the online data collection process
- Respondent interactions with the email invitation and the whole questionnaire paradata
- Device and browser paradata
- Questionnaire navigation paradata
- Mouse clicks & movements + answer changes
- Movements across pages, tabs, and prompt messages
- Time spent per question, screen and questionnaire
- Question by question completion rate, breakoff rates and identity questions leading to abandonment
- Experiment and randomization flags paradata
- Indirect paradata
- Paradata in real time
- Using paradata in the Total Survey Error framework
The short course is online survey platform agnostic, but we will be able to answer the most common questions about the most popular web survey platforms such as Qualtrics, SurveyMonkey, and LimeSurvey. Finally, an up to date list of paradata usage papers will be shared with the participants.
Interactive component: The participants are encouraged to bring their laptop and we will use a Google sheet dataset to compute some paradata indexes during the class.
Dr Mario Callegaro is Senior Staff UX Researcher in the Google Cloud Platform user experience team. He focuses on helping the team in collecting high quality surveys about our cloud platform products. Mario consults on numerous surveys, market research, and user experience projects in terms of survey design, questionnaire design, sampling, weighting, and reporting. Mario holds a M.S. and a Ph.D. in Survey Research and Methodology from the University of Nebraska, Lincoln. Mario has published numerous papers, book chapters, and presented at international conferences on survey methodology and data collection methods. He published an edited book with Wiley titled Online Panel Research: A Data Quality Perspective. A year later he published with Sage a handbook titled Web Survey Methodology with Katja Lozar Manfreda and Vasja Vehovar, recently available as open access.
Dr Vlad Achimescu is a Quantitative user experience researcher in Google Shopping where he focuses on the measurement of user task success and satisfaction. He holds a PhD in Sociology from the University of Mannheim, Germany and a MsC in Statistics from KU Leuven, Belgium. He has published papers and presented at international conferences on topics such as survey methodology, online misinformation detection and social exclusion.
Stephen Wheeler is a user-experience researcher and practitioner of both qualitative and quantitative methods, including survey research, and has worked in the field of UX for three decades. He has an MA in Natural Sciences from the University of Oxford, and an M.Sc. in Human Factors in Human-Computer Interaction from the University of London.