We explore frameworks and criteria for determining which actors (e.g., government agencies, AI companies, third-party organisations) are best suited to develop AI model evaluations. Key challenges include conflicts of interest when AI companies assess their own models, the information and skill requirements for AI evaluations and the (sometimes) blurred boundary between developing and conducting evaluations. We propose a taxonomy of four development approaches: government-led development, government-contractor collaborations, third-party development via grants, and direct AI company development.
We present nine criteria for selecting evaluation developers, which we apply in a two-step sorting process to identify capable and suitable developers. Additionally, we recommend measures for a market-based ecosystem to support diverse, high-quality evaluation development, including public tools, accreditation, clear guidelines, and brokering relationships between third-party evaluators and AI companies. Our approach emphasises the need for a sustainable ecosystem to balance the importance of public accountability and efficient private-sector participation.