ESRA logo

ESRA 2025 Preliminary Program

              



All time references are in CEST

Measurement and coding of job-related information: Occupation, industry, and skill

Session Organisers Dr Malte Schierholz (LMU Munich)
Ms Olga Kononykhina (LMU Munich / Munich Center for Machine Learning (MCML))
Dr Calvin Ge (TNO)
TimeWednesday 16 July, 09:00 - 10:30
Room Ruppert 111

Occupation coding refers to coding a respondent’s text answer (or the interviewer’s transcription of the text answer) about the respondent’s job into one of many hundreds of occupation codes. Relatedly, many surveys gather data about the person’s industry or her various skills in similar ways. We welcome any papers on how to best measure jobs and job-related information, including, but not limited to:
- measurement of occupations, industries, and skill (e.g., mode, question design, …)
- handling of different occupational and industry classifications (e.g., ISIC, NACE, NAICS, ISCO, ESCO and national classifications)
- problems of coding (e.g., costs, data quality, …)
- techniques for coding (e.g., automatic coding, computer-assisted coding, manual coding, interview coding)
- computer algorithms for coding (e.g., machine learning, LLMs, rule-based, …)
- cross-national and longitudinal issues
- Measurement of derived variables (e.g., ISEI, ESeC, SIOPS, job-exposure matrices, …)
- other methodological aspects related to the measurement and coding of job-related information

Keywords: measurement, coding, occupation, industry, skill, long-list questions

Papers

Is Green Really Green? Challenges in Measuring Green Competencies

Professor Marcin Kocór (Jagiellonian University) - Presenting Author
Professor Barbara Worek (Jagiellonian University)

In the face of implementing the Green Deal policy, labor market research increasingly focuses on green skills. However, their definition and measurement pose significant challenges. In the latest edition of the Human Capital Study survey, the need arose not only to operationalize the measurement of these skills but also to highlight mismatches in this area.
During the presentation, difficulties related to defining green skills and creating questionnaire questions for employers and employees will be discussed. The research results will also be presented, allowing for an assessment of the validity of the proposed approach. The conclusions from the presentation may contribute to a better understanding and more effective measurement of green competencies in the context of the labor market and skills measurement.


Can Large Language Models Advance Occupational Coding? Evidence and Methodological Insights

Mrs Olga Kononykhina (LMU Munich / Munich Center for Machine Learning (MCML)) - Presenting Author
Dr Malte Schierholz (LMU Munich)

Occupational coding is a critical funnel between open-ended job descriptions and the statistical frameworks that shape employment research and policies. Automatic coding tools—whether rule-based or machine learning (ML)—have streamlined the process, and demonstrate promising results. Yet, ML approaches typically require extensive, high-quality training data that exceed what a typical national survey can provide and fall under data protection constraints.

This study asks whether mainstream large language models (LLMs) can serve as a viable alternative, largely bypassing the need for exhaustive training data and requiring only some coding skills and API access. We created embeddings for standardized German (Kldb) job descriptions, then used respondents’ own words (e.g., “doctor”) from a representative German survey to generate job embeddings. Cosine similarity was applied to find the five most likely occupational codes for each response.

To assess performance, we compared LLM-based suggestions with those from a German ML occupational coding tool (OccuCoDe), using professional manual coding as our benchmark. Results show that in 55% of the cases, both LLM and OccuCoDe included the correct code among their top five suggestions. However, there was limited overlap: in 60% of the cases, the two tools shared at most two out of their five recommended codes. While OccuCoDe more frequently placed the correct code as the first suggestion, LLM-embeddings suggested the correct occupation in 45% of cases where OccuCoDe did not provide any result. Additionally, LLM performance was sensitive to minor changes in job descriptions (e.g., capitalisation or gendered job titles) and sometimes showed “embedding drift,” raising reproducibility concerns.

Our findings highlight LLMs’ promise as a complement or substitute to other tools for occupational coding in limited training data contexts, while underscoring critical limitations that must be addressed before fully entrusting them with classifying the work we do.


An Evaluation of the Look-Up approach to Occupation Coding: Evidence from the Next Steps Study in the UK

Dr Sebastian Kocar (Institute for Social Science Research, University of Queensland) - Presenting Author
Dr Daniela Peycheva (Institute of Epidemiology and Health Care, University College London)
Dr Matt Brown (Centre for Longitudinal Studies, University College London)
Professor Joseph W. Sakshaug (Institute for Employment Research (IAB), and Professor of Statistics, Ludwig Maximilian University of Munich)
Dr Claire Bhaumik (Ipsos UK)
Professor Lisa Calderwood (Centre for Longitudinal Studies, University College London)

Occupation coding is a critical component of social research, providing essential insights into socio-economic status. Traditionally, occupation data have been collected through interviewer administration using open-ended questions and manual office coding, a method regarded as the "gold standard." However, this approach is resource-intensive and may be less feasible for self-completion surveys. As web surveys become increasingly prevalent, it is vital to explore new approaches to occupation coding.
One new approach is the look-up self-coding approach in which participants (or interviewers) enter keywords which describe their job and select an appropriate code from a presented list. In this paper we evaluate the potential of this approach and compare it with traditional office coding. We use data from an experiment conducted in the 9th wave of the Next Steps longitudinal study in the UK, a web-first mixed mode survey in which participants were asked to both self-code their occupation and provide an open-ended job description, which was then manually coded by two independent office coders.
The study uses two indicators of feasibility and data quality: the look-up coding rate and the agreement between look-up and office coding. We will explore the impact of respondent characteristics and look-up input metrics and will use these findings to propose potential improvements to the look-up approach.
The findings will be of significant value to survey practitioners wishing to collect information about occupation.