Who Should Develop Which AI Evaluations?

Who Should Develop Which AI Evaluations?

January 14, 2025

Lara Thurnherr, Robert Trager, Amin Oueslati, Christoph Winter, Cliodhna NĂ­ Ghuidhir, Joe O'Brien, Jun Shern Chan, Lorenzo Pacchiardi, Anka Reuel, Merlin Stein, Oliver Guest, Oliver Sourbut, Renan Araujo, Seth Donoughe and Yi Zeng

View Journal Article / Working Paper >

We explore frameworks and criteria for determining which actors (e.g., government agencies, AI companies, third-party organisations) are best suited to develop AI model evaluations. Key challenges include conflicts of interest when AI companies assess their own models, the information and skill requirements for AI evaluations and the (sometimes) blurred boundary between developing and conducting evaluations. We propose a taxonomy of four development approaches: government-led development, government-contractor collaborations, third-party development via grants, and direct AI company development.

We present nine criteria for selecting evaluation developers, which we apply in a two-step sorting process to identify capable and suitable developers. Additionally, we recommend measures for a market-based ecosystem to support diverse, high-quality evaluation development, including public tools, accreditation, clear guidelines, and brokering relationships between third-party evaluators and AI companies. Our approach emphasises the need for a sustainable ecosystem to balance the importance of public accountability and efficient private-sector participation.

Image for Examining AI Safety as a Global Public Good: Implications, Challenges, and Research Priorities

Examining AI Safety as a Global Public Good: Implications, Challenges, and Research Priorities

March 11, 2025
Image for Looking ahead: Synergies between the EU AI Office and UK AISI

Looking ahead: Synergies between the EU AI Office and UK AISI

March 11, 2025