ESRA 2019 Draft Programme at a Glance

Linking Surveys and Social Media Data – Challenges, Applications and Solutions 3

Session Organisers Professor Alexia Katsanidou (GESIS - Leibniz Institute for the Social Sciences)
Dr Johannes Breuer (GESIS - Leibniz Institute for the Social Sciences)
Dr Katharina Kinder-Kurlanda (GESIS - Leibniz Institute for the Social Sciences)
Dr Sebastian Stier (GESIS - Leibniz Institute for the Social Sciences)
TimeWednesday 17th July, 16:30 - 17:30
Room D21

When it comes to measuring phenomena that are of interest to social scientists, such as attitudes, beliefs, values or behavior, both surveys and data from social media platforms have their own advantages and disadvantages. For example, while survey data may be biased by social desirability or faulty memory, data from social media often lack important contextual information and do not capture relevant outcome variables. A promising way of dealing with the limitations of surveys and social media data is to link them. Such linking can help to answer interesting substantive research questions as well as methodological questions about the quality of the data (e.g., regarding the reliability of self-reports or the precision of inferring attributes from social media data).
The process of linking survey and social media data, however, is by no means trivial and comes with its own set of practical as well as ethical challenges. These relate to a variety of issues, including data access, informed consent, limitations imposed by terms of service of social media companies, data privacy, and data archiving and sharing. While there is some pioneering research that has linked data from surveys and social media to answer substantive or methodological questions, this approach is still not widely used, and an exchange of expertise is necessary to improve practices and create standards in this area. We invite contributions for this session that present suggestions for dealing with the various practical and ethical challenges of linking survey and social media data (ideally based on examples). Contributions can be empirical, methodological or conceptual. Relevant topics include but are not limited to:
• Examples of substantive or methodological questions that can be answered by combining surveys and social media data
• Improvement of measurements of human attitudes, beliefs, values, and behavior through the combination of surveys and social media data
• Incentives and Informed consent for studies that link surveys and social media data
• Bias in the sampling process and potential solutions
• Data sharing issues of linked survey and social media data

Keywords: social media, data linking, ethics, data sharing

Acquiring personal Facebook data – Is it still possible?

Dr Zoltan Kmetty (Eötvös Loránd University)
Ms Anna Vancsó (Corvinus University, Budapest)
Mr Daniel Váry (Eötvös Loránd University)
Mr Adam Stefkovics (Eötvös Loránd University) - Presenting Author

Facebook (FB) is the biggest social media site in the word and it is growing continuously.
Thus, an extremely huge amount of data arises continuously. There are plenty of study dealing with FB data, but except a few rare examples, they rely on public data. Facebook changes its data privacy policy time to time. In 2015 a major revision was done, which made the collection of personal data more difficult. And after the recent Cambridge Analytica scandal the access to FB data become even more harder. The FB graph API is the only channel now which can be used to collect public personal data of the users. Besides this data acquiring strategy, there are two other ways to collect social media data: scraping specified public sites through Graph API and purchasing data from companies, that collect and store publicly available FB data. If we would like to dig deeper we need to collect data from FB users. To solve this problem, we started to conduct an experimental pilot study. We are asking our participants to save their Facebook activity data in their general account setting page, and give it to us. We are also conducting a small survey with the respondents to extend the FB activity data with additional information. After the initial processing of social media data, it will be merged with survey data – collected from the same respondents.

At the ESRA conference we would like to present the first results of our research, along with the difficulties of the implemented approach and the ethical and GDPR dimensions of this type of data collection.

New Data Sources in Social Science Research: Things to Know Before Working with Reddit Data

Dr Ashley Amaya (RTI International) - Presenting Author
Dr Ruben Bach (Universitat Mannheim)
Dr Florian Keusch (Universitat Mannheim)
Dr Frauke Kreuter (University of Maryland)

Social media is becoming more popular as a source of data for social science researchers. These data are plentiful and offer the potential to answer new research questions at smaller geographies and for rarer subpopulations. When deciding whether to use data from social media, it is useful to learn as much as possible about the data and its source. Social media data have properties quite different from what many social scientists are used to working with, so the assumptions often used to plan and manage a project may no longer hold. For example, they are so big that they may not be able to be processed on a single machine; they are in file formats which many researchers are unfamiliar, and they require a level of data transformation and processing that has rarely been required when using more traditional data sources (e.g., survey data). Unfortunately, this type of information is often not obvious ahead of time as much of this knowledge is gained through word-of-mouth and experience. In this paper, we attempt to document several challenges and opportunities of working with Reddit, the self-proclaimed “front page of the internet” and popular social media site. Specifically, we provide descriptive information about the Reddit site and its users, tips for using organic data from Reddit for social science research, some ideas for conducting a survey on Reddit, and lessons learned in merging survey responses with Reddit posts. While this paper is specific to Reddit, researchers may also view it as a list of the type of information one may seek to assemble prior to conducting a project that uses any type of social media data source.

Tracking Presidential Approval with Twitter:\\A Critical Comparison of Cross-Sectional and Longitudinal Analyses

Ms Robyn Ferg (University of Michigan) - Presenting Author
Dr Johann Gagnon-Bartsch (University of Michigan)
Dr Fred Conrad (University of Michigan)

Relationships found between data extracted from social media and public opinion polls have led to optimism about supplementing traditional surveys with new sources of data. However, not enough attention has been paid to investigating whether these relationships might be spurious. Replicating previous analyses, we calculate the correlation between sentiment of tweets containing the word "Trump" and President Trump's daily presidential approval rating through mid-2018. We develop a framework to interpret the strength of this correlation. Using the idea of a placebo analysis, we perform the same analysis as we did with the "Trump" tweets, but with placebo words unrelated to presidential approval. This creates a reference distribution to which we can compare our observed correlation between "Trump" tweets and presidential approval. The reference distribution suggests that the correlation we find between "Trump" tweets and presidential approval is not especially strong and likely spurious. For the goal of supplementing traditional surveys with data extracted from social media, this is a cautionary rather than optimistic result.

After failing to find an especially strong correlation when tracking "Trump" tweets, we provide evidence that a political signal is present in Twitter data. To do this, we follow politically active Twitter users over time. After classifying politically active users as a Democrat or a Republican, we demonstrate that there is a political signal in both tweeting frequency and sentiment of the politically active users, with a clear change in sentiment immediately following the 2016 presidential election. We follow these users through mid-2018 and find relationships between the sentiment of these users' tweets and presidential approval.