C. Denison | TSG Lab – Technical Safety & Governance Lab

C. Denison | TSG Lab – Technical Safety & Governance Labhttps://tsglab.github.io/author/c.-denison/C. DenisonHugo Blox Builder (https://hugoblox.com)en-usSat, 01 Jun 2024 00:00:00 +0000https://tsglab.github.io/media/logo.svgC. Denisonhttps://tsglab.github.io/author/c.-denison/Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Modelshttps://tsglab.github.io/publication/sycophancy-to-subterfuge/Sat, 01 Jun 2024 00:00:00 +0000https://tsglab.github.io/publication/sycophancy-to-subterfuge/Sleeper Agents: Training Deceptive LLMs That Persist Through Safety Traininghttps://tsglab.github.io/publication/sleeper-agents/Mon, 01 Jan 2024 00:00:00 +0000https://tsglab.github.io/publication/sleeper-agents/