ESRA logo
Tuesday 14th July      Wednesday 15th July      Thursday 16th July      Friday 17th July     

Friday 17th July, 13:00 - 14:30 Room: O-201

When do social media data align with survey responses and administrative data?

Convenor Professor Michael Schober (New School for Social Research )
Coordinator 1Professor Frederick Conrad (University of Michigan)

Session Details

Demonstrations that analyses of social media content can align with measurement from sample surveys or from administrative data (like unemployment insurance claims) have raised the question of whether survey research can be supplemented or even replaced with less costly and burdensome data mining of already-existing or “found” social media content. But just how trustworthy such measurement can be—say, to replace official statistics—is unknown. New conversations between survey methodologists and data scientists are needed to understand the potential points of alignment and non-alignment, given different starting assumptions and analytic traditions on, for example, the extent to which adequate social measurement requires representative samples drawn from frames that fully cover the population.

What is needed are principles and hypotheses for understanding when and why alignment between social media analyses and survey responses or administrative data should and should not be found. Empirically, demonstrations that social media data can predict survey responses do not always replicate. Much more needs to be understood about the effects of the many potentially relevant factors: the range of survey topics and domains, different methods for mining the social media content, different algorithms for converting social media content
into quantifiable data, and different techniques for measuring alignment.

This panel will present empirical work that advances the conversation about (a) when analyses of social media content might provide estimates accurate enough to be used as reliable social measures or published as official statistics—and when they might not, (b) how self-report in surveys and analyses of social media content might complement and supplement each other, and (c) what should inform decisions about which methods to use for which purposes.

Paper Details

1. A “collective-vs-self” hypothesis for when Twitter and survey data tell the same story
Dr Michael Schober (New School for Social Research)
Dr Frederick Conrad (University of Michigan)
Dr Josh Pasek (University of Michigan)

We investigate a “collective-vs-self” hypothesis for when a collection of tweets, which is inherently non-representative, will yield conclusions similar to those of survey data from a representative sample. Our hypothesis is that alignment between estimates based on these two sources of information will be greater when the survey questions to which the tweets are compared concern groups larger than individuals or their households. Evidence in support of this hypothesis comes from alignment between data from the US Survey 
of Consumer Attitudes and Behavior (SCA) and sentiment analyses of tweets containing the word “jobs” from 2008-2014.

2. Using Twitter Data to Calibrate Retrospective Assessments in Surveys
Dr Josh Pasek (University of Michigan)
Ms Elizabeth Hou (University of Michigan)
Dr Michael Schober (New School for Social Research)

This study explores the use of Twitter data to determine how respondents appear to construct retrospective evaluations of the economy. Specifically, we use cross-correlations to link self-reported measures of economic performance in different time periods, as assessed by the Survey of Consumer Attitudes, with sentiment in Tweets containing the word “jobs” from 2008-2014. The results imply that individuals think about a range of time points when making retrospective economic judgments, what we term a temporal latitude of acceptance.

3. Using Social Media to Measure Labor Market Flows
Professor Margaret Levenstein (University of Michigan)
Professor Matthew Shapiro (University of Michigan)
Professor Michael Cafarella (University of Michigan)

Social media enable new approaches to measuring economic activity and analyzing economic behavior at high frequency and in real time. This paper uses Twitter data to create indexes of job loss, search, and posting. Signals are derived by counting job-related phrases in Tweets. The indexes are constructed from the principal components of these signals. The University of Michigan Social Media Job Loss Index tracks initial claims for unemployment insurance, predicts 15 to 20 percent of the variance of the prediction error of the consensus forecast for initial claims, and provides greater signal regarding the true state of job loss.

4. When #THC meets #TSE on @twitter: A Discussion of Surveys and Twitter for Examining Attitudes toward Marijuana Legalization
Dr Yuli Patrick Hsieh (RTI International)
Mr Joe Murphy (RTI International)

In this presentation, we dissect the total survey error (TSE) framework and its applicability to Twitter focusing on the measured attitudes toward marijuana legalization. We examine attitudes towards legalization shared on Twitter using incrementally refined search query specifications, with an emphasis on the potential error sources related to each iteration. We relate the relative strengths and weaknesses of these queries to the TSE notions of coverage and measurement error. We conclude with reflections on the viability of TSE for delimiting the errors of social media data and thoughts on the role of social media in survey measurement moving forward.