PhD Seminar Series: LLM Hacking - Are LLMs reliable annotators?

LLM Hacking - Are LLMs reliable annotators?
Speaker: Joachim Baumann, Post-Doc - Bocconi University
Abstract: Large language models (LLMs) are rapidly transforming computational social science by automating text annotation at an unprecedented scale. However, this efficiency comes with a hidden cost: LLM outputs vary dramatically based on seemingly minor implementation choices like model selection, prompting strategy, and temperature settings. We study LLM hacking as an emerging threat to scientific integrity in this new research paradigm. LLM hacking occurs when implementation choices introduce systematic biases in data annotations that propagate to downstream analyses, potentially invalidating scientific conclusions. To quantify this risk, we conduct extensive experiments across 56 annotation tasks from 21 datasets, testing multiple state-of-the-art LLMs. Evaluating thousands of plausible hypothesis tests, we find that LLM hacking yields incorrect conclusions in approximately 20% of cases, manifesting as Type I (false positive), Type II (false negative), or Type S (wrong sign for significant effect) errors. Critically, even models achieving close to 100% accuracy exhibit LLM hacking risk, though higher task-specific performance correlates with reduced risk. These results demand a fundamental shift in LLM-assisted research practices – from treating LLMs as convenient black-box annotators to recognizing them as complex instruments requiring rigorous validation. In this seminar, we will present key predictors of LLM hacking risk across annotation tasks and discuss practical validation techniques for achieving reliable LLM-driven scientific conclusions.