<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>P. Torr | TSG Lab – Technical Safety &amp; Governance Lab</title><link>https://tsglab.github.io/author/p.-torr/</link><atom:link href="https://tsglab.github.io/author/p.-torr/index.xml" rel="self" type="application/rss+xml"/><description>P. Torr</description><generator>Hugo Blox Builder (https://hugoblox.com)</generator><language>en-us</language><lastBuildDate>Wed, 01 Oct 2025 00:00:00 +0000</lastBuildDate><image><url>https://tsglab.github.io/media/logo.svg</url><title>P. Torr</title><link>https://tsglab.github.io/author/p.-torr/</link></image><item><title>Rethinking Safety in LLM Fine-Tuning: An Optimization Perspective</title><link>https://tsglab.github.io/publication/rethinking-safety-llm-finetuning/</link><pubDate>Wed, 01 Oct 2025 00:00:00 +0000</pubDate><guid>https://tsglab.github.io/publication/rethinking-safety-llm-finetuning/</guid><description/></item><item><title>Beyond Linear Probes: Dynamic Safety Monitoring for Language Models</title><link>https://tsglab.github.io/publication/dynamic-safety-monitoring-linear-probes/</link><pubDate>Mon, 01 Sep 2025 00:00:00 +0000</pubDate><guid>https://tsglab.github.io/publication/dynamic-safety-monitoring-linear-probes/</guid><description/></item><item><title>Do Sparse Autoencoders Generalize? A Case Study of Answerability</title><link>https://tsglab.github.io/publication/sparse-autoencoders-generalize-answerability/</link><pubDate>Tue, 01 Jul 2025 00:00:00 +0000</pubDate><guid>https://tsglab.github.io/publication/sparse-autoencoders-generalize-answerability/</guid><description/></item><item><title>PoisonBench: Assessing Large Language Model Vulnerability to Data Poisoning</title><link>https://tsglab.github.io/publication/poisonbench/</link><pubDate>Tue, 01 Jul 2025 00:00:00 +0000</pubDate><guid>https://tsglab.github.io/publication/poisonbench/</guid><description/></item><item><title>Towards Interpreting Visual Information Processing in Vision-Language Models</title><link>https://tsglab.github.io/publication/visual-information-processing-vlms/</link><pubDate>Tue, 01 Apr 2025 00:00:00 +0000</pubDate><guid>https://tsglab.github.io/publication/visual-information-processing-vlms/</guid><description/></item><item><title>Open Problems in Machine Unlearning for AI Safety</title><link>https://tsglab.github.io/publication/open-problems-machine-unlearning/</link><pubDate>Wed, 01 Jan 2025 00:00:00 +0000</pubDate><guid>https://tsglab.github.io/publication/open-problems-machine-unlearning/</guid><description/></item><item><title>Toward Resisting AI-Enabled Authoritarianism</title><link>https://tsglab.github.io/publication/resisting-ai-authoritarianism/</link><pubDate>Wed, 01 Jan 2025 00:00:00 +0000</pubDate><guid>https://tsglab.github.io/publication/resisting-ai-authoritarianism/</guid><description/></item><item><title>Interpreting Learned Feedback Patterns in Large Language Models</title><link>https://tsglab.github.io/publication/interpreting-feedback-patterns-llms/</link><pubDate>Sun, 01 Dec 2024 00:00:00 +0000</pubDate><guid>https://tsglab.github.io/publication/interpreting-feedback-patterns-llms/</guid><description/></item><item><title>Towards Interpretable Sequence Continuation: Analyzing Shared Circuits in Large Language Models</title><link>https://tsglab.github.io/publication/interpretable-sequence-continuation/</link><pubDate>Fri, 01 Nov 2024 00:00:00 +0000</pubDate><guid>https://tsglab.github.io/publication/interpretable-sequence-continuation/</guid><description/></item><item><title>Quantifying Feature Space Universality Across Large Language Models via Sparse Autoencoders</title><link>https://tsglab.github.io/publication/feature-space-universality-sparse-autoencoders/</link><pubDate>Tue, 01 Oct 2024 00:00:00 +0000</pubDate><guid>https://tsglab.github.io/publication/feature-space-universality-sparse-autoencoders/</guid><description/></item><item><title>Measuring Value Alignment</title><link>https://tsglab.github.io/publication/measuring-value-alignment/</link><pubDate>Fri, 01 Dec 2023 00:00:00 +0000</pubDate><guid>https://tsglab.github.io/publication/measuring-value-alignment/</guid><description/></item></channel></rss>