Reasons to Doubt the Impact of AI Risk Evaluations
Evaluations may fail to improve AI risk Understanding or Mitigation and could even cause harm, but evaluations could still be valuable with a few considerations.
Do you have a clear picture of how AI risk evaluations lead to a reduction in AI risks?
My new position paper questions this picture by outlining 15 ways in which AI risk evaluations may fail to deliver value or even be harmful and discussing what we might do despite this!

Abstract
AI safety practitioners invest considerable resources in AI system evaluations, but these investments may be wasted if evaluations fail to realize their impact. This paper questions the core value proposition of evaluations: that they significantly improve our understanding of AI risks and, consequently, our ability to mitigate those risks. Evaluations may fail to improve understanding in six ways, such as risks manifesting beyond the AI system or insignificant returns from evaluations compared to real-world observations. Improved understanding may also not lead to better risk mitigation in four ways, including challenges in upholding and enforcing commitments. Evaluations could even be harmful, for example, by triggering the weaponization of dual-use capabilities or invoking high opportunity costs for AI safety. This paper concludes with considerations for improving evaluation practices and 12 recommendations for AI labs, external evaluators, regulators, and academic researchers to encourage a more strategic and impactful approach to AI risk assessment and mitigation.
Links
Read the preprint here: https://arxiv.org/abs/2408.02565
Or comment directly on it here: https://alphaxiv.org/abs/2408.02565