ESRA logo

Back to Overview of Sessions

Opportunities and Limitations of Web Scraping in Social Research

Coordinator 1Mr Aleksei Rotmistrov (National Research University Higher School of Economics)
Coordinator 2Miss Svetlana Zhuchkova (National Research University Higher School of Economics)

Session Details

Because of its development and deep penetration into many areas of life, the Internet has become a source of a massive amount of information, including social one. Thus, users leave “footprints” of their communication or other activity on the pages of social networks, online communities, and various thematic sites. Web scraping, which is an actively used method in computer science to collect such data automatically, gradually transpasses into the field of social sciences. The use of web-scraped data opens up new opportunities for such a research, due to the large volume and non-reactive nature of the information received. However, such data have some severe limitations: they are heterogeneous and not structured, most of the extracted features are categorical variables (which limits the variety of methods used for analysis), the proportion of missing data among the studied objects increases, and so on. Besides, the use of data from web pages changes a study’s design in an unusual way: the data-driven paradigm, in which data becomes the basis of the future theory and which is not typical for social sciences, comes to the fore. The natural question is: what is the real potential of this data collection method in social research? Is web scraping able to be used as a qualitative analog or a substitute for the standard survey data? If so, what are the ways to overcome the identified constraints? If not, what are the limits of this approach? The session’s participants are invited to respond to these questions in their reports and demonstrate their own experience in the use of web scraping and web data.