ACL/EMNLP | TSG Lab – Technical Safety & Governance Lab

ACL/EMNLP | TSG Lab – Technical Safety & Governance Labhttps://tsglab.github.io/tag/acl/emnlp/ACL/EMNLPHugo Blox Builder (https://hugoblox.com)en-usSat, 01 Nov 2025 00:00:00 +0000https://tsglab.github.io/media/logo.svgACL/EMNLPhttps://tsglab.github.io/tag/acl/emnlp/Beyond Linear Steering: Unified Multi-Attribute Control for Language Modelshttps://tsglab.github.io/publication/beyond-linear-steering-multi-attribute-control/Sat, 01 Nov 2025 00:00:00 +0000https://tsglab.github.io/publication/beyond-linear-steering-multi-attribute-control/Precise In-Parameter Concept Erasure in Large Language Modelshttps://tsglab.github.io/publication/precise-concept-erasure-llms/Sat, 01 Nov 2025 00:00:00 +0000https://tsglab.github.io/publication/precise-concept-erasure-llms/Same Question, Different Words: A Latent Adversarial Framework for Prompt Robustnesshttps://tsglab.github.io/publication/latent-adversarial-prompt-robustness/Sat, 01 Nov 2025 00:00:00 +0000https://tsglab.github.io/publication/latent-adversarial-prompt-robustness/Trust Me, I'm Wrong: High-Certainty Hallucinations in LLMshttps://tsglab.github.io/publication/trust-me-im-wrong-hallucinations/Sat, 01 Nov 2025 00:00:00 +0000https://tsglab.github.io/publication/trust-me-im-wrong-hallucinations/Interpreting Context Look-ups in Transformers: Investigating Attention-MLP Interactionshttps://tsglab.github.io/publication/attention-mlp-interactions/Fri, 01 Nov 2024 00:00:00 +0000https://tsglab.github.io/publication/attention-mlp-interactions/Towards Interpretable Sequence Continuation: Analyzing Shared Circuits in Large Language Modelshttps://tsglab.github.io/publication/interpretable-sequence-continuation/Fri, 01 Nov 2024 00:00:00 +0000https://tsglab.github.io/publication/interpretable-sequence-continuation/Large Language Models Relearn Removed Conceptshttps://tsglab.github.io/publication/llms-relearn-removed-concepts/Thu, 01 Aug 2024 00:00:00 +0000https://tsglab.github.io/publication/llms-relearn-removed-concepts/Detecting Edit Failures in Large Language Models: An Improved Specificity Benchmarkhttps://tsglab.github.io/publication/detecting-edit-failures-llms/Sat, 01 Jul 2023 00:00:00 +0000https://tsglab.github.io/publication/detecting-edit-failures-llms/The Larger They Are, the Harder They Fail: Language Models Do Not Recognize Identifier Swaps in Pythonhttps://tsglab.github.io/publication/identifier-swaps-python/Sat, 01 Jul 2023 00:00:00 +0000https://tsglab.github.io/publication/identifier-swaps-python/