Search

Research
Publications
People
Media
Events
Vacancies
Contact

Article

Increasing Trust in Language Models Through the Reuse of Verified Circuits

P. Quirke, C. Neo, F. Barez

Safeguarding AI in Finance: Lessons for Regulated Industries

F. Barez, L. Marks

Sleeper Agents: Training Deceptive LLMs That Persist Through Safety Training

E. Hubinger, C. Denison, J. Mu, M. Lambert, M. Tong, M. MacDiarmid, F. Barez, Et Al.

What Does GPT Store in Its MLP Weights? A Case Study of Long-Range Dependencies

T. Clark, S. B. Cohen, F. Barez

AI Systems of Concern

K. Matteucci, S. Avin, F. Barez, S. Ó HÉigeartaigh

Fairness in AI and Its Long-Term Implications on Society

O. Bohdal*, T. Hospedales, P. H. S. Torr, F. Barez*

Identifying a Preliminary Circuit for Predicting Gendered Pronouns in GPT-2 Small

C. Mathwin, G. Corlouer, E. Kran, F. Barez, N. Nanda

«

Technical Safety & Governance Lab

Department of Engineering Science
University of Oxford

Contact

Department of Engineering Science
Parks Road, Oxford OX1 3PJ

Legal Privacy Policy Cookie Policy

© 2026 Technical Safety & Governance Lab, University of Oxford

Cite