In Which Areas of Technical AI Safety Could Geopolitical Rivals Cooperate?

In Which Areas of Technical AI Safety Could Geopolitical Rivals Cooperate?

May 23, 2025

Ben Bucknall, Saad Siddiqui, Lara Thurnherr, Conor McGurk, Ben Harack, Anka Reuel, Patricia Paskov, Casey Mahoney, Sören Mindermann, Scott Singer, Vinay Hiremath, Charbel-Raphaël Segerie, Oscar Delaney, Alessandro Abate, Fazl Barez, Michael K. Cohen, Philip Torr, Ferenc Huszár, Anisoara Calinescu, Gabriel Davis Jones, Yoshua Bengio, and Robert F. Trager

View Journal Article / Working Paper >

“Many risks arising from AI are inherently international in nature, and so are best addressed through international cooperation.” – Bletchley Declaration, 2023

International cooperation has long been a part of managing risks from advanced technologies. During the Cold War, despite intense rivalry, the United States and Soviet Union collaborated on nuclear verification methods through initiatives like the Joint Verification Experiment, which facilitated progress on arms control agreements. Today we observe ongoing international cooperation in artificial intelligence (AI) development, including between geopolitical rivals such as the US and China. Additionally, recent years have seen numerous high-level calls for international cooperation specifically on AI safety and governance, from the landmark agreement at the Bletchley AI Safety Summit in 2023 to the consensus statements issued by the International Dialogues on AI Safety. As in historical analogues, cooperating on the safety of advanced geopolitically sensitive technologies such as AI could play an important role in managing emerging risks.

However, cooperation between geopolitical rivals carries its own risks that can and should be carefully weighed in order to ensure that the benefits of cooperation can be fully realised by all parties. This is no less true for safety research on AI (‘technical AI safety’). Some AI safety techniques have ‘capability externalities’, as improvements in safety concurrently provide gains in model performance – for example, reinforcement learning from human feedback (RLHF). Furthermore, some areas of AI safety such as model evaluations for chemical, biological, radiological and nuclear (CBRN) capabilities involve sensitive national security-related information which could be leaked to cooperators. The process of cooperation may also provide avenues for motivated actors to cause harm, such as by placing backdoors in jointly developed infrastructure.

Many frameworks and risk management processes from governments exist to guide international technology cooperation at a general level, such as suggesting researchers conduct due diligence on the identity of their counterparties. However, as leading governments’ and researchers’ continued focus on deepening the science of AI safety demonstrates, additional analysis is needed to fully address the specific technical characteristics and geopolitical risks associated with cooperation on AI safety research.

This paper addresses this gap by identifying technical risks specific to cooperation on AI safety research, focusing on the impact of cooperation on advancing capabilities, sharing sensitive information and providing opportunities for harm. We analyse the relative risk of cooperation for four different areas of technical AI safety and assess the feasibility of proposed cooperation in each area. We do not aim to definitively identify the most suitable areas for cooperation, nor to investigate specific benefits of cooperation in different AI safety research areas, though believe these would be valuable directions for future work.

This paper is roughly broken into three parts. The first provides an overview of cooperation on strategic technologies, through i) outlining why geopolitical rivals typically cooperate on strategic technologies; ii) surveying the existing state of cooperation between rivals on AI research; and iii) looking at how risks from cooperation are currently managed. The second part highlights four potential risks to which actors cooperating with rivals on topics in technical AI safety may be exposed. The final part builds on this by assessing the extent to which proposed areas of cooperation on technical AI safety may succumb to these identified risks. We give a brief overview of each area, as well as a discussion regarding its suitability for cooperation in light of the potential risks previously introduced, finding that research into i) AI verification mechanisms and ii) protocols may be areas that are particularly well-suited for international cooperation.

Image for Proposal for UN Independent Scientific Panel on AI: Balancing Rigor and Legitimacy

Proposal for UN Independent Scientific Panel on AI: Balancing Rigor and Legitimacy

May 2, 2025
Image for Examining AI Safety as a Global Public Good: Implications, Challenges, and Research Priorities

Examining AI Safety as a Global Public Good: Implications, Challenges, and Research Priorities

March 11, 2025