Program at a glance 2021

Collecting and linking non-survey data

Session Organiser Dr Hayk Gyuzalyan (Highgate Consultancy)
TimeFriday 23 July, 15:00 - 16:30

The session's papers which look at various aspects of collecting and linking non-survey data. Several papers look at collecting objective measurements as part of survey data collection process, using special devices to measure physical activity, travelling distances and times and eye tracking of respondents. Other papers present the results of linking survey responses to the administrative data. Papers explore the perspectives of using non-survey data to enhance the data collected through respondents.

Keywords: linking administrative data, collecting non-survey data, oculographic, accelerometer, GPS

Spatially Linking Objective Air Quality Data and a Micro-Level Panel Survey Shows: Selective Mobility Contributes to Immigrants Higher Exposure to Environmental Pollution

Dr Felix Bader (TU Kaiserslautern) - Presenting Author
Professor Henning Best (TU Kaiserslautern)
Mr Ingmar Ehler (TU Kaiserslautern)
Dr Tobias Ruttenauer (Oxford University)

"The debate on environmental justice has reached Germany decades ago, but limited data availability has hindered a detailed account of the processes leading to the unequal exposure to environmental pollution. This study provides the first longitudinal analysis on the mechanism of selective mobility leading to environmental inequality in Germany based on objective pollution data.

We link geo-referenced longitudinal household-level data from the German Socio-Economic Panel (GSOEP) for 2008-2016 with 2x2 km estimates of annual air pollution by the German Environment Agency, including nitrogen dioxide, particulate matter, and sulphur dioxide. We show that first- and second-generation immigrants are exposed to higher levels of air pollution around their place of residence. On the one hand, this might be due to selective siting of pollution sources like factories or roads close to places where minorities reside. On the other hand, environmental inequality might result from selective residential sorting: households experience different improvements in air quality when relocating. Using fixed effects models, we find that, on average, relocating households move to places with cleaner air, but this improvement for non-immigrants is about twice as large as for immigrants.

Selective mobility contributes to environmental inequality in Germany, thereby aggravating the disadvantage of immigrants. Interestingly, we find a similar pattern and mechanism for lower income groups, but effect magnitudes are much weaker. This indicates that lower income is not the driving force behind immigrants’ environmental disadvantage.

Pupil diameter dynamics as an indicator of the respondent's cognitive load in CASI and P&PSI modes

Professor Inna Deviatko (National Research University Higher School of Economics)
Mr Mikhail Bogdanov (National Research University Higher School of Economics)
Mr Daniil Lebedev (National Research University Higher School of Economics) - Presenting Author

In recent years, along with a general increase in interest in the use of various methods of cognitive load measurement and subjectively perceived mental effort associated with solving various problems and interpersonal communication, there has also been an increase in the specific interest of researchers working in various fields of social sciences regarding the possibilities of using multimodal cognitive load assessment of interviewers and the respondents using objective and subjective indicators, including paradata and webcam data, in order to optimize its impact on the quality of survey data. At the same time, the possibilities of relatively new approaches to measuring cognitive load using neurophysiological methods, such as the use of subtle and not disrupting the natural course of respondents and interviewers activity modern wearable devices for oculography (eye tracking and pupillometry), which allow an accurate time linkage of measured parameters’ dynamics (first of all, the size of the pupil) to the specific format of the question, mode and phase of survey completion, the presence of an external influence localized in time, etc. remain underestimated. Existing quantitative studies of the cognitive load arising in the course of survey completion and its possible impact on the quality of survey data focused mainly on computer-assisted (CAPI) or paper-based (PAPI) interviewing, while the specificity of the cognitive load which arises in the process of self-completion by respondents with computerized (CASI) and paper (P&PSI) questionnaires remained understudied.
The article presents the results of a methodological experiment in which a modified version of the previously used multimodal approach to the comparative assessment of the cognitive load of interviewers when filling out paper and computerized versions of the questionnaire was used to assess the cognitive load of respondents. We have expanded the range of methods used to assess cognitive load by using a wearable oculographic device (eye tracker) to measure the dynamics of pupil size associated with distinct survey items (questions) completion. Results of the experiment made it possible to confirm the hypothesis about the approximate equivalence of the two modes of survey completion in terms of cognitive load for young respondents with a high level of functional computer literacy, as well as to conduct an initial assessment of the technical and metrological capabilities and limitations associated with the use of pupil dynamics’ indicators measured using a wearable oculographic device to study respondents’ cognitive load.

Making Time Count: A Machine Learning Approach to Predict Time Use from Sensor Data on Physical Activity

Dr Seyit Hocuk (Tilburg University)
Dr Talip Kilic (World Bank) - Presenting Author
Dr Pradeep Kumar (Tilburg University)
Dr Joris Mulder (Tilburg University)
Dr Alberto Zezza (World Bank)

Devising policies for addressing gender disparities in both productive and reproductive labor relies on the accurate measurement of men’s and women’s time use. Yet, individual-level data and statistics on time use exhibit serious weaknesses. The shortcomings in the availability, comparability and quality of time use data can in part be traced back to underinvestment in methodological research for the development of scalable and accurate survey methods.

Relatedly, collecting objective measures of physical activity has become more popular and affordable thanks to recent advances in accelerometer technology. Both research- and consumer-grade accelerometers are used extensively in studies on physical activity, and these sensors have been shown to provide accurate insights on physical activity. Research has further revealed that artificial intelligence can be more useful in predicting human activities from body-worn sensors than traditional methods. Yet, applications in low- and middle-income countries remain scant and these methods are not yet used in large-scale household surveys.

Our paper investigates whether time use information can be imputed from unstructured physical activity sensor output collected in free living conditions. We do so by leveraging machine learning and unique survey data collected in rural Malawi, inclusive of both self-reported information on time use and accelerometer data for the same individuals. Since collecting accurate time use data is both cost- and supervision-intensive, it is operationally-relevant for survey practitioners to know whether time use can be predicted from physical activity tracking data that can also inform a range of downstream analyses related to labor productivity, poverty, food security and nutrition.

Our analysis uses data collected by ActiGraph accelerometers worn by 415 adults (aged 15+) from 215 agricultural households in two districts in Malawi. For each adult, the physical activity tracking data were collected successfully at 1-minute interval for 14 days and during all waking hours. In addition, a 24-hour recall time-use module was twice administered to each adult, one week apart. The time use module aimed to identify the primary activity for each 15-minute interval of the preceding 24-hour period. Furthermore, the weight and height of each subject was measured, and information on labor outcomes, agricultural production, consumption and expenditures were collected.

We train a supervised machine learning model with the time use data for a randomly-selected 80 percent of the sample. The model uses a range of predictors, including raw accelerometer data (for the two days with time use information) and a rich set of individual- and household-level attributes. We then derive predicted measures of time allocation to a range of activities for the remaining 20 percent of the sample and document the heterogeneity in predictive performance by age and gender. The paper reveals both strengths and weaknesses in recall-based time use data and provides recommendations for future research in this area. The findings are broadly relevant for advancing the agenda on the introduction of sensor-based, objective measurements of physical activity in large-scale socio-economic surveys.

Using a mixed-methods approach to compare travel diaries and schematic questioning of travel behavior

Ms Lisa Bönisch (Institute for Transport Studies, Karlsruher Institute of Technology) - Presenting Author
Mr Sascha von Behren (Institute for Transport Studies, Karlsruher Institute of Technology)
Mr Bastian Chlond (Institute for Transport Studies, Karlsruher Institute of Technology)
Mr Peter Vortisch (Institute for Transport Studies, Karlsruher Institute of Technology)

Longitudinal travel diaries are cost-intensive, time consuming, and face low response rates. This results from the high respondent burden for participants, since they have to report every single trip over a certain and random period. Nevertheless, travel diaries are an established survey method in transportation research. At the same time, there is a high demand for a more efficient method to capture individuals’ mobility. GPS tracking offers one possibility but fails to capture detailed information such as trip purpose and the socio-demographic aspects. In contrast, the schematic questioning of travel behavior provides an effective alternative. The so called ‘travel skeleton’ approach is adapted from a one-week travel diary by asking for typical frequencies of activities and mode choice both referring to a typical week. It has been developed as a quasi-longitudinal approach and allows for a more general analysis of individual travel patterns. Participants provide a self-assessment of their travel behavior in a compressed and single survey which effectively reduces the effort of survey participation. To better understand the applicability of this alternative approach a comparison of both methods regarding the measurability of travel behavior is needed.
In this study, we present a mixed-methods approach comprising three phases. Our analysis is based on surveys with 62 university students between 2016 and 2019. Further analyses also include data from 44 additional students surveyed in December 2020. All participants filled in both a travel diary for the course of one week and a ‘travel skeleton’. We additionally conducted in-depth interviews with 16 randomly selected participants. This enhances our understanding of discrepancies between the reported travel behavior in both survey concepts and helps to capture underlying motives for travel. In the analyses, we focus on the typical frequency of the use of different means of transport and activities (school and leisure) by comparing the reported trips within the one-week diary and the participants’ assessment of travel.
Regarding frequently used means of transport, e.g. the bicycle, we found that 32% of the participant assessed a higher frequency in the ‘travel skeleton’ than they reported in the trip diary respectively. Conversely, less frequently used means of transport tend to be underestimated. For example, the car as a driver (21%) or public transport (35%) were reported more often in the trip diary than the participants indicated as a ‘typical’ frequency in the ‘travel skeleton’. In terms of activities, we found that the assessed number of schooldays per week is mostly in line with the number of trips reported in the trip diary (46%). In contrast, the individual assessment of the typical frequency of leisure activities within a week is highly dissimilar compared to the information in the trip diary for 82% of the participants.
In summary, both methods show different strengths in the survey of travel behavior. Using the interview data will help to identify appropriate use cases for each survey concept and to investigate the randomness of trips reported in diaries.

The Accuracy of Self-Reported Dwelling Valuation

Professor Aviad Tur-Sinai (The Max Stern Yezreel Valley College) - Presenting Author
Dr Larisa Fleishman (Israeli Central Bureau of Statistics)
Dr Dmitri Romanov (Israeli Central Bureau of Statistics)

Owners’ valuations of dwelling prices are central in the construction of price indices and households’ economic behavior. We analyze the variation of the self-reported valuation bias over the distribution of dwelling sale prices, using a dataset of observations from a Household Expenditure Survey merged with the national sample of housing sale transactions by census tract.
This study augments the research literature on this topic in several ways. We focus on investigating the accuracy of subjective valuations across a distribution of dwelling values. The study is based on a unique database that covers more than a decade and combines dwelling valuations culled from a large national survey with data on sale transactions at the level of census tract. We augment the analysis with demographic and socioeconomic indicators of the population of the census tract and the spatial environment of the dwelling. Given information about the change in transaction prices in a census tract over time, we investigate the correlation between the dwelling valuations reported in the survey and dwelling prices in the tract within a one-year window before and after the date of participation in the survey.
We find that self-reported estimates of dwelling values are, on average, 20% higher than the mean market prices of houses in the corresponding census tracts. Estimates reported by people who occupy dwellings in the lowest eight deciles of the price distribution are upward-biased, whereas those who live in the most expensive dwellings more typically understate the value of their homes. The self-reported valuation bias is systematically associated with owner's traits and with dwelling and neighborhood characteristics. Misspecification might be another potential explanation for that bias. The frequency of dwelling sales in the respondent's tract was found to have an effect on the self-reported valuation bias.