All time references are in CEST
Data collection with wearable devices 2
| Dr Alexander Wenz (University of Mannheim)
Professor Christopher Antoun (University of Maryland)
|Wednesday 19 July, 16:00 - 17:30
Wearable devices, such as smart watches and activity trackers, are increasingly used for data collection in the social, behavioral, and health sciences. Equipped with a wide range of sensors, these devices allow researchers to measure physical activity, sleep behavior, and cardiovascular health, among many other things. While most previous sensor-based studies were implemented on small-scale samples of volunteers, recent studies have started to extend this approach to larger samples of the general population.
Despite the proliferation of wearable devices as tools for data collection, the potential sources of error in such data are not yet fully understood. In this session, we welcome contributions that examine and improve the quality of data collection with wearable devices, for example:
• Recruitment; nonparticipation; non-adherence
• Measurement error
• Weighting; imputation
• Fieldwork; implementation issues
• Errors when processing and interpreting sensor data
• Consent; privacy
Keywords: Sensors, Activity trackers, Accelerometers, Data Quality
Professor Arie Kapteyn (Center for Economic and Social Research, University of Southern California ) - Presenting Author
Mr Htay-Wah Saw (Center for Economic and Social Research, University of Southern California & Michigan Program in Survey and Data Science (MPSDS), University of Michigan-Ann Arbor )
Mr Bas Weerman (Center for Economic and Social Research, University of Southern California )
Publicly available pollution data are mostly regional-level data such as those collected by EPA’s weather stations. Such data are likely to miss substantial differences in individual exposures to pollution, both inside the home, at work, or elsewhere. To address this lack of granularity, we have asked some 500 respondents (balanced across education, race & ethnicity, household income) to the Understanding America Study (UAS) to wear an air quality monitor (Atmotube) (https://atmotube.com/atmotube-pro) continuously for one year. In addition, we have conducted monthly surveys of the respondents’ home characteristics (heating and cooling types; cooking stoves, proximity of busy roads) and of their whereabouts in 30-minute episodes during the previous 24 hours (home, work, motor vehicle, other). The air quality monitor collects pollution and weather data at 1-minute intervals and is Bluetooth enabled so that it communicates with a smartphone app.
By merging in EPA’s regional-level data, we are able to disentangle the effects of local air quality and micro-climates such as inside one’s home, at work, or when traveling with a motor vehicle. Early pilot data based on about 150 respondents show strong effects of dwelling characteristics (e.g., if it is close to a busy street) and by socio-economic status. In the presentation we will present descriptive results of how air quality varies by location and socio-economic characteristics. Furthermore, to gain insight into individual exposure to air quality, we will decompose individual pollution exposure into its various components: regional air quality and variation by individuals’ location during the day.
Dr Katharina Meitinger (Utrecht University) - Presenting Author
Dr Vera Toepoel (CBS)
Dr Ellen de Hollander (RIVM)
Dr Wanda Wendel-Vos (RIVM)
Dr Marjolein Duijvestijn (RIVM)
Professor Tommi Vasankari (UKK-instituutti)
Physical activity (PA) and sedentary behavior are important life style factors that affect public health. Many countries measure PA at a national population level to monitor prevalence and changes over time to inform public health policy with subjective self-reports using questionnaires such as the European Health Interview Survey, Eurobarometer, Global Physical Activity Questionnaire and the International Physical Activity Questionnaire. However, previous research showed that self-report surveys have poorer measurement qualities than accelerometers (van Nassau et al., 2015). Accelerometers have been proven successful in cohort and intervention studies, but have rarely been included in population monitoring instruments due to costs. This presentation evaluates for which socio-demographic subgroups it is worth to use accelerometers in addition to survey data. We focus on respondent willingness to wear the accelerometer and which subgroups differ the most between survey and accelerometer data and how (over- vs. underestimation of PA).
Data for this study comes from a representative sample of the Dutch population which was asked to complete an online questionnaire and to wear an accelerometer for seven days. Fieldwork started March 18th 2019 and ended on October 31th 2019. Respondents received the accelerometer type UKK RM42. Sample size is 1,018 Dutch respondents. Our results show that age is a significant predictor for the willingness to wear an accelerometer. In addition, a higher BMI and age for respondents in the age group of 51-65 years old increases the odds to overestimate PA. At the same time, underestimation of PA decreases for certain age groups, as well as medium level education and having a chronic disease.
Dr Seyit Höcük (Centerdata - Tilburg University)
Dr Talip Kilic (The World Bank)
Mr Pradeep Kumar (Centerdata - Tilburg University)
Mr Joris Mulder (Centerdata - Tilburg University) - Presenting Author
Dr Alberto Zezza (The World Bank)
Obtaining data on people’s time use can be time consuming and costly. We investigate whether predicting time use is possible by applying machine learning on sensor data instead. This way, costly and difficult to obtain time use surveys can be replaced by cheaper, more accurate, and objective wearable devices to study gender disparities in rural, developmental communities.
We use data collected in rural communities of Malawi. The time use data is a recall-based questionnaire and the accumulated answers are categorized into 25 group activities. Sensor based data is collected using a tri-axial accelerometer. Using ML-techniques, we build a supervised classification model on the combined time use and sensor data to predict the performed activity on a minute level. We utilize the reported activities as our dependent variable and the collected sensor data together with basic participant background information as our independent variables.
Two ML-classifiers are used for training models; Gradient Boosting and Random Forest. Our models can predict the performed time-use activity with decent accuracy using sensor-based and background data. We obtain a macro average F1-score of 76% for our best performing model. A random guess would result in an F1-score of merely 8.33% since we made use of twelve balanced classes. Adding background information as additional independent variables can add a consistent boost to model performance.
We successfully show that predicting time use activities with physical activity sensor data is possible with high accuracy. This is despite any possible limitations imposed by the quality of the recall-based data and the level of data aggregation of the sensors. We also successfully identified the activities that are harder to predict and classify, which will help in improving future data collection efforts to build better machine learning models.