Search

Research
Publications
People
Media
Events
Vacancies
Contact

SafetyNet: Detecting Harmful Outputs in LLMs by Modeling and Monitoring Deceptive Behaviors

M. Chaudhary, F. Barez

May 2025

Type

Publication

arXiv:2505.14300

Safety & Alignment

Technical Safety & Governance Lab

Department of Engineering Science
University of Oxford

Contact

Department of Engineering Science
Parks Road, Oxford OX1 3PJ

Legal Privacy Policy Cookie Policy

© 2026 Technical Safety & Governance Lab, University of Oxford

Cite