Technical Governance

AutoControl Arena: Synthesizing Executable Test Environments for Frontier AI Risk Evaluation

C. Li, P. Lu, X. Pan, F. Barez, M. Yang

Agentic Product Maturity Ladder V0.1

S. McGregor, D. Nathani, L. Saouma, F. Barez, A. Foundjem, Et Al.

The Capability Frontier: Benchmarks Miss 82% of Model Performance

B. Fowler, R. Smith, D. T. Graviet, W. Myers, J. Greaves, N. F. Oozeer, A. García, Et Al.

Establishing Best Practices for Building Rigorous Agentic Benchmarks

Y. Zhu, T. Jin, Y. Pruksachatkun, A. Zhang, S. Liu, S. Cui, S. Kapoor, F. Barez, Et Al.

Beyond Monoliths: Expert Orchestration for More Capable, Democratic, and Safe Language Models

P. Quirke, N. Oozeer, C. Bandi, A. Abdullah, J. Hoelscher-Obermaier, F. Barez, Et Al.

In Which Areas of Technical AI Safety Could Geopolitical Rivals Cooperate?

B. Bucknall, S. Siddiqui, L. Thurnherr, C. McGurk, B. Harack, A. Reuel, F. Barez, Et Al.

The Singapore Consensus on Global AI Safety Research Priorities

Y. Bengio, T. Maharaj, L. Ong, S. Russell, D. Song, M. Tegmark, L. Xue, F. Barez, Et Al.

Safety Frameworks and Standards: A Comparative Analysis to Advance Risk Management of Frontier AI

M. Ziosi, J. Gealy, M. Plueckebaum, D. Kossack, S. Campos, L. Saouma, F. Barez, Et Al.

Verification for International AI Governance

B. Harack, R. Trager, A. Reuel, D. Manheim, M. Brundage, O. Aarne, Et Al.

Position: Near to Mid-Term Risks and Opportunities of Open-Source Generative AI

F. Eiras, A. Petrov, B. Vidgen, C. S. De Witt, F. Pizzati, K. Elkins, F. Barez, Et Al.