All time references are in CEST
Applications, Potentials, and Challenges when Using Google Trends in Combination or as Substitute for Surveys 2
|Session Organisers|| Professor Florian Keusch (University of Mannheim)
Ms Johanna Mehltretter (University of Mannheim)
Dr Christoph Sajons (University of Mannheim)
|Time||Wednesday 19 July, 14:00 - 15:00|
Aggregated Internet search data from Google Trends are increasingly used as a supplement or alternative to survey data. Proponents of Google Trends argue that anonymous search queries of Internet users are a good reflection of true interest, behaviors, and attitudes, particularly for sensitive topics, where surveys suffer from measurement error due to social desirability. In addition, Google Trends allows researchers to study changes in topic salience, attitudes, and behaviors across time and geographic areas at much finer granularity than possible in surveys. On the downside, using Google Trends data may include multiple problems. First, not everybody uses the Google search function, potentially leading to selection bias. Second, Google Trends only provide search volumes based on a sample of all search queries, thus questions of reliability arise. And third, it is often unclear how validly the selected search terms measure the constructs of interest.
In this session, we aim to bring together empirical evidence on the state-of-the-art use of Google Trends data in combination with or as an alternative to self-reports from surveys. Submissions can be methodological in orientation or can be substantive applications that demonstrate the usefulness and assess the quality of Google Trends data. Potential topics for submissions include, but are not limited to:
- Validation of Google Trends data
- Comparison of different approaches to select appropriate keywords
- Approaches to overcome reliability issues of Google Trends data
- Triangulation through joint use of Google Trends with surveys
- Analysis strategies for Google Trends data
- Best practices for transparent documentation when working with Google Trends
- Social science applications of the use of Google Trends data to measure specific attitudes, behavior, and topic salience
Ms Anne-Sophie Oehrlein (GESIS - Leibniz Institute for the Social Sciences) - Presenting Author
Dr Tobias Gummer (GESIS - Leibniz Institute for the Social Sciences)
Google Trends makes aggregated search engine data available, which enable researchers to investigate trends of search term usage on Google Search. The data obtained via Google Trends are relative search volumes for a selected search term across a predefined period of time and location. Depending on when a data retrieval query is issued and which period is specified, data are drawn from different samples: a real-time sample and a non-realtime sample. The real-time sample offers data of high granularity, whereas the non-realtime sample is a sample of search engine data starting from 2004. Previous research has questioned the reliability of non-realtime data and suggested that combining multiple samples (i.e., re-sampling) may help to mitigate these reliability issues. However, there appear to be differences between search terms on how well such procedures perform. It remains an open question on how to implement re-sampling in practice when aiming to reduce differences between non-realtime and real-time data. Specifically, it remains an open question how many samples to combine for specific search terms. With the present study, we address the issue of reliability of Google Trends data by investigating two research questions: (i) How does combining multiple samples reduce differences between non-realtime and real-time data? (ii) Does the performance of re-sampling differ between search terms?
To answer these research questions, we will collect real-time data for a one-week period. We will then collect daily re-samples of non-realtime data for the same search terms and periods for at least two months (i.e., 60 samples per search term). In our analyses, we will investigate how re-sampling reduces differences to the real time sample, conditional for search terms.
Miss Johanna Mehltretter (University of Mannheim) - Presenting Author
Professor Florian Keusch (University of Mannheim)
Dr Christoph Sajons (University of Mannheim)
Researchers increasingly use aggregated Internet search data, in particular from Google Trends, as a supplement or alternative to survey data. These data are assumed to be less prone to recall bias or social desirability bias for sensitive topics, can be accessed almost in real-time, and allow researchers to study changes in interests, attitudes, and behaviors across time and geographic areas at much finer granularity than in traditional surveys. Using this kind of data comes with important challenges with respect to construct validity, sample stability, and representativeness, however, that may severely restrict the meaningfulness of the obtained results. In this paper, we describe and assess the state-of-the-art of research with Google Trends data in the social sciences. We first identify and discuss the most important issues for valid and reliable measurement of topic salience, attitudes, and behaviors. Next, we conduct a systematic literature review of 365 studies using Google Trends data in the social sciences to (1) illustrate habits and trends over the past decade and (2) assess whether researchers take the identified challenges into account. The results show that the large majority of the literature fails to assess the validity of their Google Trends measure, does not consider whether the retrieved data is consistent across samples, and is not aware of the lack of representativeness of their data. We conclude by stating a set of guidelines that will help researchers reduce these problems and properly work with Google Trends data.
Dr Trent Buskirk (Bowling Green State University) - Presenting Author
Mr John Jardine (Bowling Green State University)
Mr Youzhi Yu (Bowling Green State University)
Rises in survey costs and declining response rates combined with increases in the number and types of access methods for alternate big data sources have allowed survey researchers a wider pallet of information to draw upon for creating estimates of public opinion. Administrative, open web data and digital trace data like Google Search Trends have emerged as possible alternative sources of alternate data, in part because of their cost and ease of acquisition. In this research we utilize over a dozen open-web administrative and survey data sources that were publicly available through the CovidCast API from the Delphi Group at Carnegie Mellon University along with Google Trends data that we gathered over a contemporaneous 50-plus week time window. The Google Trends data captured incidence rates for keywords within 8 COVID-19 categories including: COVID disease, social distancing, testing, symptoms, masking, sanitizing, working and general virus. To facilitate analyses of this full corpus of gathered data, we built an R-shiny application that allows users to easily explore relationships between several outcomes simultaneously over time. Specifically, in this paper we use our tool to explore relationships over a 50-plus week time window at the height of the pandemic between Google Health Trends data gathered for the 8 COVID-19 categories and hundreds of COVID-19 outcomes measured across the data sets complied from both survey and open web administrative data, such as doctor visit incidence, symptom reporting and social distancing indices, among others. Our analyses will focus on three common measures of association between the Google Health Trends data and other data sources including correlation, comovement and change points. We will present both the findings of these specific analyses of Google Health Trends data as well as present the R-shiny app that will allow researchers to explore