Who Should Develop Which AI Evaluations?

Who Should Develop Which AI Evaluations?

January 14, 2025

Lara Thurnherr, Robert Trager, Amin Oueslati, Christoph Winter, Cliodhna NĂ­ Ghuidhir, Joe O'Brien, Jun Shern Chan, Lorenzo Pacchiardi, Anka Reuel, Merlin Stein, Oliver Guest, Oliver Sourbut, Renan Araujo, Seth Donoughe and Yi Zeng

View Journal Article / Working Paper >

We explore frameworks and criteria for determining which actors (e.g., government agencies, AI companies, third-party organisations) are best suited to develop AI model evaluations. Key challenges include conflicts of interest when AI companies assess their own models, the information and skill requirements for AI evaluations and the (sometimes) blurred boundary between developing and conducting evaluations. We propose a taxonomy of four development approaches: government-led development, government-contractor collaborations, third-party development via grants, and direct AI company development.

We present nine criteria for selecting evaluation developers, which we apply in a two-step sorting process to identify capable and suitable developers. Additionally, we recommend measures for a market-based ecosystem to support diverse, high-quality evaluation development, including public tools, accreditation, clear guidelines, and brokering relationships between third-party evaluators and AI companies. Our approach emphasises the need for a sustainable ecosystem to balance the importance of public accountability and efficient private-sector participation.

Image for Advancing Digital Rights in 2025: Trends, Challenges and Opportunities in the UK, EU and global landscape

Advancing Digital Rights in 2025: Trends, Challenges and Opportunities in the UK, EU and global landscape

February 12, 2025
Image for The Future of the AI Summit Series

The Future of the AI Summit Series

February 3, 2025