Potentials and constraints of weighting to improve survey quality
|Convenor||Dr Stephanie Steinmetz (University of Amsterdam )|
|Coordinator 1||Professor Kea Tijdens (University of Amsterdam)|
High refusal rates in probability samples and the rising use of non-probability access panels threaten the quality and comparability of surveys from different modes of data collection.
Although the efficiency is often disillusioning, weighting remains a possible solution to achieve (conditional) representativity between different modes.
The paper evaluates the possibility of balancing distributions from different modes (F2F, telephone, internet) of the “German Longitudinal Election Study” (GLES) through “Propensity Score Matching”. The potential and limits of different matching algorithms (e.g. “Genetic Matching”) in reducing discrepancies on auxiliary and key variables of interest will be shown and evaluated.
With the rise of the Internet more and more data are collected via volunteer web panels, which are not representative for the general population. Post-survey adjustment techniques are often used to improve data quality. However, which methods work best and which variables need to be taken into account differ per survey. To structure such methods for the volunteer Dutch Leisure Panel, different weighting methods are evaluated and results are compared to outcomes of a probability-bases survey. Finally, the effectiveness of weighting for volunteer panels and if or when probability-based panels should be preferred given cost-efficiency will
Survey statisticians have introduced the correlation between the weighting and target variable as an important consideration for the selection of weighting variables to inform the weighting strategy: whenever this correlation is zero, the mean of the target variable will not change, only the standard error potentially increases. This is why it is often proposed to exclude such variables in the weighting model. This paper seeks to demonstrate that such a selection criterion to select weighting variables is troublesome as it artificially or even fraudulently tries to increase the power of the sample or analysis.
Many researchers use Voting Advice Applications (VAA) data to estimate voters’ position on the statements used in VAAs. VAA sites can attract thousands or even millions of users and generate large and cheaply collected datasets. However, as these data are collected on a non-probability basis they are not representative of the total population. The paper explores the sample bias of HelpMeVote in two countries, Iceland with a high internet penetration and Greece with a low penetration. An attempt is made to overcome the problem of the non-representativeness in HelpMeVote by applying different post-adjustment techniques.
USDA’s National Agricultural Statistics Service (NASS) conducts the June Agricultural Survey (JAS) annually. Using the Census Mailing List (CML) in years that the census of agriculture is conducted and the NASS list frame in non-census years, capture-recapture methods have been developed to improve the JAS estimates of the number of U.S. farms and land in farms. Although an operation's farm status is determined in census years, the status is not always fully determined in non-census years, providing additional technical challenges. A comparison of the capture-recapture estimates using the CML and the NASS list