All time references are in CEST
Integrating Data from Different Survey Projects
| Dr Piotr Cichocki (Adam Mickiewicz University)
Dr Piotr Jabkowski (Adam Mickiewicz University)
Dr Marta Kołczyńska (Institute of Political Studies of the Polish Academy of Sciences)
|Tuesday 18 July, 11:00 - 12:30
Combining data from different sources constitutes a promising research avenue for secondary data users, as it allows for filling in the coverage gaps in individual cross-national survey projects. Such cross-project data harmonisations face severe methodological challenges, as the underlying survey projects may differ significantly in measurement instruments (i.e., question-wording and response scale formats), modes of data collection, sampling designs, and fieldwork procedures or quality assurance protocols. Secondary users of survey data need standardised and easily applicable procedures to facilitate analyses unbiased by the methodological differences among the underlying survey projects.
This session is dedicated to the advances in data integration procedures, including methodological innovations and applications. Welcome contributions include, but are not limited to, papers that:
A. propose methods for accounting for cross-project differences in data quality and comparability,
B. propose methods for accounting for cross-project differences in the measurement instruments,
C. propose measures of the Total Survey Error or its various aspects,
D. combine data from different sources in order to study a substantive research problem.
Dr Angelo Moretti (Utrecht University) - Presenting Author
Dr Alejandra Arias-Salazar (Freie Universitat Berlin)
Dr Natalia Rojas-Perilla (United Arab Emirates University)
Multidimensional poverty is a leading topic in national and international agendas, “End poverty in all its forms everywhere” is part of the Sustainable Development Agenda (UN General Assembly, 2015). In relation to the sustainable development goals, there is the need for disaggregated information at geographical level and based on relevant characteristics of the population (e.g., sex, age, ethnicity). The United Nations Economic Commission for Latin America and the Caribbean is developing a regionally comparable Multidimensional Deprivation Index (MDI) for 18 Latin American countries, which is based on the Alkire and Foster (2007) approach. In summary, it considers 5 dimensions and 8 indicators. Our aim is to produce reliable estimates of the MDI and its components (indicators and dimensions) for the adult population of Colombia at different administrative division levels based on the Census data for 2018.
Unfortunately, two of the indicators are based on variables that are not collected by the National Population and Housing Census, i.e., education and employment. In this work, we propose and evaluate three strategies in order to generate the education and employment variables synthetically for the Census microdata, and then produce the MDI estimates at small area level taking intrinsic correlation structures and uncertainty sources. In particular, the first strategy involves the use of a combination of statistical matching and fractional hot-deck imputation techniques to mass impute the Census variables based on common variables between a reference survey, i.e., Great Integrated Household Survey and the Census (the reference period is 2018). The second strategy relates to the use of prediction models, in this case the unit-level Bernoulli logit mixed in order to generate missing indicators (Gutierrez et al, 2022). The third and final strategy is built on a spatial microsimulation approach (Lovelace, 2015). In this research, we conduct a large scale simulation study based on the Great Integrated Household Survey in order to evaluate both the quality of the final small area estimates of the MDI at small area level and the relationships between deprivation and socio-demographic variables, at unit and area level; assuming that policy makers are interested in producing small area estimates and analyses based on the unit-level data. Furthermore, we discuss the issue of estimating the uncertainty arising from the approaches, i.e., variance and mean squared error, which is a crucial topic in Official Statistics.
Ms Marion Thiele (Federal Institute for Vocational Training and Education (BIBB))
Ms Myriam Baum (Federal Institute for Vocational Training and Education (BIBB)) - Presenting Author
Dr Dominik Becker (Federal Institute for Vocational Training and Education (BIBB))
Professor Harald Pfeifer (Federal Institute for Vocational Training and Education (BIBB))
Participation in further vocational training (FT) aims to ensure employability. Especially in times of crisis likes recessions or pandemics, because those heavily impact the labour and training market i.e. rising unemployment, decreased human capital investments. Yet, research on the interrelation of the business cycle (BC) with FT is limited. Among others due to a lack of an encompassing data-base. Existing research mainly focus on the supply-side and is restricted to the financial crisis in 2008 or the Covid pandemic. Therefore, our project focuses on how the BC affects the individual FT decisions, and whether reduced firm investments are substituted by individual investment, in three subprojects: 1) overview on how the BC affects individual FT; 2) in-depth analysis on how the technological change moderates this relation; 3) respectively on how individuals' risk preferences affect this relation.
We base our analysis on two data-sets: For 1) we use the Microcensus (2005-2020), for 2) and 3) we use the data from the SC6 of the National Educational Panel Study (2007-2020 + Covid survey). We enrich this data with administrative data of BC indicators (e.g. unemployment rates; GDP) and firm-level data on technological change (e.g. Mannheim Innovation Panel). The BC indicators are matched three months prior to the respondents' FT, or in case of no FT 15 months prior to the interview. The administrative data is linked by using regional, year and sectoral information, for the link of firm-level data we additionally use firm-size information. For the analysis we will use panel regressions, which consider problems like unobserved heterogeneity and reversed causality. We will show how we generated this unique data-sets, which helps understanding individual investments in FT in the times of economic crisis, as well as the difficulties while doing so.
Dr Hai-Anh Dang (World Bank)
Dr Talip Kilic (World Bank) - Presenting Author
Dr Vladimir Hlasny (United Nations Economic and Social Commission for Western Asia)
Dr Calogero Carletto (World Bank)
Dr Kseniya Abanokova (World Bank)
Household consumption survey data that underlie poverty estimates in low-income countries are often unavailable, unreliable or incomparable. Survey-to-survey imputation has been increasingly employed to address these data gaps. Dang, Kilic, Abanokova and Carletto (2022) develop poverty imputation models using household surveys conducted in Ethiopia, Malawi, Nigeria, Tanzania, and Vietnam. They find that adding household utility expenditures to a basic imputation model with household demographic and employment attributes produces accurate poverty predictions vis-à-vis observed estimates. Hence, using an imputation model that is estimated with a baseline survey with full consumption data and that is applied to a follow-up lighter survey that solely collects information on poverty predictors could be a promising approach to reliably fill poverty data gaps at low cost. Yet, Kilic and Sohnesen (2019) document that applying an imputation model to follow-up surveys that vary in terms of length/complexity vis-à-vis the baseline can generate substantial differences in predicted poverty rates. Against this background, this paper reports on a randomized survey experiment that was implemented in Tanzania in 2022, featuring three treatment arms: (T1) a standard questionnaire that provides observed consumption and poverty estimates and permits the estimation of all imputation models presented in Dang et al. 2021; (T2) a light questionnaire that permits the estimation of a selection of models that present the most modest scenario of data collection; and (T3) an augmented light questionnaire that permits the estimation of an expanded set of models vis-à-vis T2 but that is shorter and less complex than T1. This design will allow us to assess, with respect to observed household consumption expenditure and poverty estimates (based on T1), the accuracy of imputed poverty estimates based on alternative models and varying scope of target survey questionnaires (as obtained through T2 and T3).
Dr Janete Saldanha Bach (GESIS – Leibniz Institute for the Social Sciences) - Presenting Author
Dr Claus-Peter Klas (GESIS – Leibniz Institute for the Social Sciences)
Research entities are increasingly interconnected at various levels and can be expressed in relationship maps, such as knowledge graphs. Interrelations between study units, instruments, questions, response scales, and variables are modelled within and across studies using Data Documentation Initiative - Lifecycle (DDI-LC). As the variable is one of the most relevant entities to enhance data reuse, we provide a framework design to better semantics the variables' relations descriptions to build a Social Sciences knowledge graph. These explicit relations between variables enable comparability across waves and facilitate data harmonization. We provide a brief textual identification of the relation type, supported by a controlled vocabulary (CV) and an extended description of the relationship since the current descriptions do not represent the variables' tie complexity. Documenting these relations will enrich the data reuse by supporting search and browse functionality. This framework will be published as a controlled vocabulary for variable relations via the CESSDA vocabulary manager. Using this proposed controlled vocabulary creates a semantically rich common Social Sciences research knowledge graph across institutes in line with the FAIR principles. As the next step, we will extend the descriptions of relations to all possible entities within DDI.
Professor Maria Paola Faggiano (Sapienza - University of Rome)
Dr Michela Cavagnuolo (Sapienza - University of Rome) - Presenting Author
Dr Viviana Capozza (Sapienza - University of Rome)
The Italian general elections held on 25 September 2022 took place in a global historical context rigged with emergency issues, including the persistent management of the Covid-19 pandemic, the difficult economic recovery, and the escalation of the Russian-Ukrainian conflict. All these issues have found specific treatments (as well as political connotations) both in the national public debate and in the electoral campaign, undoubtedly contributing to determining the elections’ results. Indeed, while the voters have expressed their views on certain priorities that should shape the political agenda of the Country, various forces in the field have articulated their programs according to different governance approaches, which historically mark their politically oriented action (left-right axis).
Starting from the hypothesis that the demands of the voters reflect their positions in social and cultural spaces (in addition to their political-ideological orientation), and that political parties’ goal is to meet and represent such requests, this paper intends to investigate these two different sides, to detect whether and how they meet and to identify the specific forms of this matching. The survey research includes a joint analysis of two distinct databases. Regarding the political demand expressed by citizens, a web survey was conducted on the voting intentions of Italian electorate during the four weeks of the electoral campaign (29 August - 23 September 2022: 698 respondents). On the political supply front, on the other hand, the entire propaganda apparatus (3,777 posts) of the official Facebook pages of the main ten political parties (that is the four coalitions supported by 93.1% of the Italian voters) was collected. Given the explicit research aim, the further methodological challenge consists in making the two different databases compatible and comparable, appropriately following a qualitative-quantitative analysis approach.