All time references are in CEST
Smart surveys: Measurement, data processing and data integration 1
| Dr Peter Lugtig (Utrecht University)
Professor Barry Schouten (Statistics Netherlands)
|Tuesday 18 July, 11:00 - 12:30
It is well known that surveys have trouble measuring certain topics that are of great interest to social and behavioral scientists.
In recent years several approaches have been proposed to extend or integrate surveys with innovative data collection methods that aim to solve some of the inherent shortcomings of self-reports asked through surveys. One approach is to start with a survey, and then within the survey try to link or collect additional data. Smartphone apps and wearable devices in particular offer a promising way to collect data through a camera, microphone, motion, or location sensors that can be integrated within an app. Another approach is to collect smart survey data to integrate survey data with external sensory, factual or behavioral data after data collection. This can for example be done by linking self-report data from surveys about income to register data from governmental records. Or by asking respondents to donate data on for example their Google history data, or whatsapp call history. Here data are collected separately in surveys and other ways, and only compared and integrated during data analysis.
We are inviting abstracts for a session focusing on the measurement, data processing and data integration of surveys. Papers can focus on, but are not limited to, one or more of the following themes:
- Examples of measurement in smart surveys using smartphone apps, where data are integrated during data collection.
- Examples of measurement in smart surveys, where survey data and external data are integrated after data collection.
- Assessment of data quality using data from multiple sources (surveys and other data sources).
- Methods to integrate or fuse survey data with other data with the goal to improve measurement
- The effects of data integration on timeliness, costs and/or precision of survey estimates.
- The role of the respondent.
Mr Oriol J. Bosch (The London School of Economics) - Presenting Author
Dr Melanie Revilla (IBEI)
Professor Patrick Sturgis (The London School of Economics)
Professor Jouni Kuha (The London School of Economics)
Measuring what people do online is crucial across all areas of social science research. Although self-reports are still the main instrument to measure online behaviours, there is evidence to doubt about their validity. Consequently, researchers are increasingly relying on digital trace data to measure online phenomena, assuming that it will lead to higher quality statistics. Recent evidence, nonetheless, suggests that digital trace data is also affected by measurement errors, questioning its gold standard status. Therefore, it is essential to understand the size of the measurement errors in digital trace data, and when it might be best to use each data source.
To this aim, we adapt the Generalised MultiTrait-MultiMethod (GMTMM) model created by Oberski et al. (2017) to simultaneously estimate the measurement errors in survey and digital trace data. The GTMM allows both survey and digital trace data to contain random and systematic measurement errors, while accommodating the specific characteristics of digital trace data (i.e., zero-inflation).
To simultaneously assess the measurement quality of both sources of data, we use survey and digital trace data linked at the individual level (N = 1,200), collected using a metered online opt-in panel in Spain. Using this data, we conducted three separate GMTMM models focusing on the measurement quality of survey and digital trace data when measuring three different types of online behaviours: news media exposure, online communication and entertainment. Specifically, for each type of behaviour, we measured three simple concepts (e.g., time spent reading articles about politics and current affairs) with both survey self-reports and digital traces. For each simple concept, we present the reliability and method effects of each data source.
Results provide needed evidence about the size of digital trace data errors, as well as when the use of self-reports might be justified.
Dr Bence Ságvári (Center for Social Sciences, Corvinus University of Budapest) - Presenting Author
Dr Bence Kollányi (Center for Social Sciences)
The study investigates mobile device use and online behaviour of 8-to 15-year-old children from Hungary using a complex mixed-methods methodology. A unique feature of the study is that it uses both automated software data collection to assess app usage patterns and survey data collected from participating children and their parents. The study, conducted in 2022, involved 100 households with school-age children from all over Hungary.
The questionnaire for parents included questions about the parents' educational and professional background, digital literacy and attitudes towards technology. The survey also included questions about children's internet and mobile phone use, including social media use, and questions about parental control and attitudes. The questionnaire for children contained questions on device use, digital literacy, digital education and social media use. Both questionnaires contained a number of questions that could be compared with the data collected from the smartphones and tablets of the participating children
The app recorded the number of screen views, the names of the apps installed on each device and the exact time each app was used. Personal data and other information from the installed apps was not collected. The duration of data collection averaged 30 days per device. We collected 943,892 data points for 1,421 different apps from 75 devices. (No data was collected from 25 devices due to technical and other issues.)
The study contributes to the literature on digital trace collection in two ways. First, it describes a method for collecting data on mobile app use, and second, it explains the methodological challenges of software-based data collection. Secondly, by combining and comparing automatically collected device usage data with survey data, we can both assess the reliability of self-reported usage behaviour and the usage patterns that emerge from the automatically collected data.
Ms Camilla Salvatore (Utrecht University) - Presenting Author
Ms Silvia Biffignandi (CESS)
Ms Annamaria Bianchi (University of Bergamo)
Probability sample surveys have been considered the gold standard for inference for many years, but they are facing difficulties related mainly to declining response rates and related increasing costs.
At the same time, an acceleration of technological advances has occurred, with the use of mobile phones and online social networks, specifically social media (SM), leading to the availability of vast amounts of new data. This is coupled with the development of new tools by computational social scientists to collect, process, and analyse digital trace data.
This article provides an overview on the roles of social media in survey research (as a substitute, as a supplement and to improve survey estimates) and for the production of smart statistics. We then introduce a general modular framework for producing smart statistics taking advantage of the two data sources. Such modular framework can be used and adapted by researchers in different contexts. We demonstrate its applicability through a case study. Finally, we highlight important questions for future research.