ESRA logo

ESRA 2023 Preliminary Glance Program

All time references are in CEST

Data donation and linking digital trace data 1

Session Organisers Ms Laura Boeschoten (Universiteit Utrecht)
Mr Johannes Breuer (GESIS)
Mr Zoltán Kmetty (Centre for Social Sciences)
Mrs Júlia Koltai (Centre for Social Sciences)
Mr Adam Stefkovics (Harvard University)
Ms Bella Struminskaya (Universiteit Utrecht)
TimeTuesday 18 July, 14:00 - 15:30
Room U6-23

Digital traces on digital platforms such as Facebook, Instagram, Google, Whatsapp, etc., and other online traces left by citizens are promising sources of information for scientific research in various fields. Although there are multiple ways to access digital data traces, in recent years, a new approach built on the partnership with citizens has emerged. Donated data can be obtained through installing web and app trackers on participants’ devices, or through data download packages from digital platforms. As opposed to self-reports from surveys which may suffer from measurement error due to recall or social desirability bias, digital traces can provide reliable, behavioral data free from those error sources. When combined with self-report, validity and reliability of measures derived from digital traces can be investigated. Linking several digital trace data sources can provide more insights into the phenomenon but also brings challenges.
While research is growing in this field, we still know little about how to best optimize digital donation approaches, the patterns and determinants of participation and ways to preserve participants’ privacy and linking digital trace data with survey responses.
We invite contributions for the session which provide new theoretical or empirical insights into any phase or aspect of donation of digital trace data. Contributions may cover the following topics but not limited to:
· Data donation methods and methods of data extraction
· Willingness to donate digital trace data, best practices for recruitment
· Sampling, and nonparticipation errors, missing data
· Validity of digital trace data
· Privacy issues, ethical issues, anonymization
· Issues of linking digital data with survey data
· Challenges, analyzing combined data
· Substantive contributions which combine digital trace and survey data

Keywords: linkage donation digital trace social media

Opportunities and challenges of real-time data linkage designs - A case study using the Spotify API

Mr Benedikt Rohr (Computational Communication Science JGU Mainz) - Presenting Author
Mrs Alicia Ernst (Computational Communication Science JGU Mainz)
Mr Felix Valentin Dietrich (Media Effects & Media Psychology JGU Mainz)
Professor Michael Scharkow (Computational Communication Science JGU Mainz)

To overcome the unique limitations of self-reported or passive measurements of behavior, social scientists increasingly adopt research designs linking digital trace and survey data (Stier et al., 2020). In most previous applications, the linkage procedure connecting both data sources occurs ex post, i.e. after a survey wave is completed and/or tracking data have been donated. For many questions in communication research, however, a real-time data linkage design seems highly attractive.
Following previous studies using event-based experience sampling (Masur, 2019), we discuss a linkage design for studying music streaming use that combines real-time API access and online surveys, where linkage happens both ex post and ex ante. We use Spotify’s Implicit Grant Flow to collect listening session information (via explicit but unobtrusive data donations) which are immediately used to anchor survey questions about listening experiences, e.g., “On February 13th, from 18:12 to 20:05, you listened to…”. This may diminish participants’ recall bias usually evident in self-reports. Finally, the survey and trace data are enriched using song-level meta-data obtained via the Spotify API. Thus, our study combines linkage design traditions from survey methodology (Stier et al., 2020), which focuses on linking trace and survey data, and from communication research, which combines media use and media content data (de Vreese et al., 2017). Our study extends comparisons of self-report and tracking data to new domains, e.g., entertainment research and perceptions of algorithmic curation, and allows to test established entertainment theories as within-person phenomena.
We discuss general and specific challenges inherent in our approach that impact data reliability and validity: insufficient API documentation, technical restrictions, bugs, missing data and linkage errors. We also discuss survey design and computational workarounds to balance usability and GDPR compliant data protection.

Data donations, are they worth the effort? The accuracy and validity of smartphone usage measures computed with self-reports and data donations

Mr Oriol J. Bosch (The London School of Economics) - Presenting Author
Mr Marc Asensio (University of Lausanne)
Dr Caroline Roberts (University of Lausanne)

When studying the relationship between smartphone usage and other aspects of people’s lives, accurate data is required. Although self-reports are the main instrument to measure smartphone usage, there is evidence to doubt about their validity. Recently, approaches to directly observe what participants do online, such as web trackers, have gained in popularity. Nonetheless, recent evidence shows that these approaches are also affected by errors and their implementation is inaccessible to most researchers.

Consequently, rincreasing interest is beeing devoted to data donations, which involve asking participants to share data that their devices and services already collect from them such as the time they spend using their phone. This approach has the advantage of not relying on either participant’s memory or tracking apps. However, compliance rates are still low, potentially introducing nonresponse bias. It is imperative for data donations, hence, to produce high-enough measurement quality gains to be considered a valid alternative to self-reports. In this study we focus on the gains when collecting already saved information about participants’ daily screentime, number of pickups and specific app usage, as reported in the Digital Wellbeing / Screentime tools of their smartphones.

To study this, we conducted a within and between-subject survey experiment in an online panel (N = 872). At the beginning of the survey, participants self-reported their usage. By the end, participants were randomly asked to share this information in three separate ways: uploading several screenshots of the tools; uploading video recordings; and manually checking and reporting the information from the tool.

We present, for each data donation approach, the absolute difference between the measurements created with self-reports and data donations. We also show the comparative convergent and predictive validity of self-reports and data donations. Additionally, we discuss potential errors affecting the data donation estimates.

A Platform for Digital Data Donation

Dr Laura Boeschoten (Utrecht University) - Presenting Author
Dr Theo Araujo (University of Amsterdam)
Dr Niek de Schipper (University of Amsterdam)
Dr Bella Struminskaya (University of Amsterdam)
Dr Heleen Janssen (University of Amsterdam)
Dr Kasper Welbers (Vrije Universiteit Amsterdam)

Digital traces left by citizens during the natural course of modern life hold an enormous potential for social-scientific discoveries, because they can measure aspects of our social life that are difficult or impossible to measure by more traditional means.
As of May 2018, the EU General Data Protection Regulation obliges any entity, public or private, that processes personal data of citizens of the European Union to provide that data to the data subject (the person to whom the data pertains) upon their request, in digital format. Most major private data processing entities, comprising social media platforms as well as internet service providers, search engines, photo storage providers, e-mail providers, banks, energy providers, and online shops comply with this right to data access, by providing the data subjects with so-called ‘Data Download Packages’ (DDPs) .
We have introduced a workflow and corresponding software to allow the collection and analyses of digital traces on the DDPs, while preserving the right to privacy and data protection of research participants.
However, as a researcher interested in preparing a data donation study, expertise on various domains is required, such as on IT and programming to configure the study, but also on how to preserve privacy, ethics and the use of an appropriate methodology.
To guide and assist researchers through this challenging process, we are developing an online platform allowing researchers to configure, host and monitor their data donation studies. During this presentation, I discuss the key functionalities of this platform such as data extraction, data storage and progress monitoring, and how they align with the GDPR and ethical requirements.

ChatDashboard - A Framework to collect, link, and process donated WhatsApp Chatlog Data

Mr Julian Kohne (GESIS - Leibniz Institute for Social Sciences; Ulm University) - Presenting Author
Professor Christian Montag (Ulm University)

In this presentation, we present the ChatDashboard framework as an infrastructure to collect, process, and link donated WhatsApp chatlog data from consenting research participants. The framework consists of the ChatDashboard R-shiny webapp for uploading, reviewing, and securely donating WhatsApp chatlogs, the WhatsR R-package as a backend for parsing and preprocessing donated WhatsApp chatlogs, and an automated testing script for testing the setup of the framework. With ChatDashboard, researchers can set up their own data donation pipelines to collect transparently donated WhatsApp chatlog data from their participants and link them to survey responses. It thus enables social scientists to retrospectively collect highly granular data on interpersonal interactions and communication without the need to build their own tools. We briefly discuss the advantages and challenges of working with donated WhatsApp chatlogs and provide a detailed overview of how these features guided the design of the ChatDashboard framework. In addition we provide a detailed explanation for how researchers can set up their own data donation pipelines and discuss several important concerns with respect to ethical questions, informed consent, anonymization, and research data management.

Volatility of Digital Data Donation packages

Mr Carriere Thijs (Utrecht University)
Dr Niek de Schipper (University of Amsterdam) - Presenting Author
Dr Laura Boeschoten (Utrecht University)
Dr Theo Araujo (University of Amsterdam)

A crucial element of the concept of digital data donation, is the fact that a research participant shares the digital traces that they leave behind during the natural course of their live with researchers. The General Data Protection Regulation (GDPR) mandates that each digital platform provides their users with a digital transportable copy of their own personal data. This transportable copy of personal data is often referred to as a Data Download Package (DDP).

Researchers performing a data donation study are often confronted with the fact that the DDP of interest contains more data than they are interested in for their research purpose, and in addition this data is often very sensitive. This challenge is generally overcome by a local extraction procedure, meaning that on the device of the participant, only those features are extracted from the DDP that are of interest to the researcher.

For a successful local extraction procedure, a script is written that can successfully handle the particular format of the DDP of the platform of interest. However, during various studies it became clear that platforms use very diverse tactics to structure their DDPs, that platforms make changes to these structure over time and that structures can even differ from one platform user to another.

In order to gain more insight in volatility of DDPs, we collected DDPs of nine major digital platform on a regular basis for a period of six months. During this period, we monitored the changes and extensions in the DDP formats of these platform. In this presentation, we discuss the most important lessons learned from this monitoring procedure. In addition, we illustrate how insight in the volatility of a DDP can help to write more robust extraction scripts, resulting in lower levels of extraction error when performing a data donation study.