Research
Publications
People
Media
Events
Vacancies
Contact
Sleeper Agents: Training Deceptive LLMs That Persist Through Safety Training
E. Hubinger
,
C. Denison
,
J. Mu
,
M. Lambert
,
M. Tong
,
M. MacDiarmid
,
F. Barez
,
Et Al.
January 2024
Type
Preprint
Publication
arXiv:2401.05566
Safety & Alignment
Cite
×