ESRA logo

Tuesday 16th July       Wednesday 17th July       Thursday 18th July       Friday 19th July      

Download the conference book

Download the program





Thursday 18th July 2013, 09:00 - 10:30, Room: No. 16

Data Archiving

Chair Mr Sebastian Kocar (Social Science Data Archives)

Paper Details

1. Leveraging data in African countries: Curating Government microdata for research

Ms Lynn Woolfrey (DataFirst, University of Cape Town)

Survey data is collected in African countries by government departments, international donor organisations, universities and other research institutions. However, only a small percentage of this data is preserved for long term reuse by researchers and government policy analysts. The bulk of African survey research is undertaken by governments and the data held by statistics offices, but data curation in these institutions is hampered by limited resources. Consequently the majority of statistics offices do not follow international best practice with regard to data curation or make their microdata available to researchers. In South Africa official microdata has been curated by the government survey data archive, SADA, since 1996. In 2001 another South African data service was established at the University of Cape Town. This data service, DataFirst, preserves South African survey microdata and has provided researchers with online data access since 2006. The service also supports data analysis and provides data quality feedback to data producers. Until recently, the only long-term data sharing option available to government data producers in other African countries was off-site archiving and sharing by foreign organisations. Since 2008 DataFirst has worked on an OECD funded project to change this, and develop data curation infrastructures and skills at African NSOs. Lessons learned from the South African service are taken to other countries in the region. Official data producers in project countries work to become better data stewards to ensure quality national data is available and utilised to refine policies and advance research in the region.


2. Cooperation of an archive and an NSI to add value to detailed non-anonymised microdata: the Slovenian good practice

Mr Sebastian Kočar (Slovene Social Science Data Archives)

The Statistical Office (SORS) and the Social Science Data Archives (ADP) are both partners of the DwB project which promotes a more thorough cooperation between European archives, NSI's and research communities. To improve the conditions in the research environment and promote official statistics microdata use in Slovenia, SORS and ADP decided to add additional activities to the cooperation.
SORS collects and distributes a significant amount of data and a researchers' need for an additional support has been recognized. Easy-to-use microdata, quality metadata and a detailed overview of available data-sets are needed. Files that a researcher receives from SORS are mostly limited to ASCII delimited microdata files and questionnaires. ADP adds value (1) by preparing microdata in a format which could later be changed to any desired statistical software format, (2) by adding variable and value labels and missing values, (3) by providing additional logical data checks, (4) preparing metadata using DDI standard and organizing all the documentation a researcher might need in one place in the safe room or by the remote access and (5) programming a tool to browse micro- and metadata. Public use files (PUF) will also be prepared by both organizations and distributed by ADP. Additionally, a list of available SORS microdata will soon be made available on both SORS and ADP webpages.
Slovenian cooperation is an example of a good practice and should be as such, in a country specific form, implemented in other European countries as well.



3. Data archive of surveys on 'language problem' in Ukraine: flexible structure of meta-data, automated search of similar questions and linking survey data to text of laws and publications

Mr Eugen Bolshov (Kyiv International Institue of Sociology)
Mr Igor Reshetnv (Kyiv International Institue of Sociology)

The problem of Russian and Ukrainian languages is one of the most debated political issues in Ukraine. Unfortunately, political analysts, academics and journalist rarely have access to data of surveys on language topic. We have created data archive (http://russian-language.org.ua) that contains data of surveys on language issues from 1993 to 2012. Data sets can be downloaded by users or user can make on-line statistical analysis. Survey data sets also linked to corresponding laws and publications with the help of common meta-data. It is well-known that linking similar questions in hundred of surveys - it is very daunting and time-consuming task. In order to solve this issue we have developed automated algorithm that look through data archive and estimate measures of similarity between thousands of question. The algorithm uses information about questions and alternatives texts; number of alternatives in questions, metadata of surveys that questions belongs to and so on. Some organizations (especially in domain of public health) are interested in creation own topical data archives. So we modified our software to give users possibility of easy assignment meta-data structures for their own archives.