Research
Publications
People
Media
Events
Vacancies
Contact
ACL/EMNLP
Beyond Linear Steering: Unified Multi-Attribute Control for Language Models
N. Oozeer
,
L. Marks
,
F. Barez
,
A. Abdullah
Precise In-Parameter Concept Erasure in Large Language Models
Y. Gur-Arieh
,
C. Suslik
,
Y. Hong
,
F. Barez
,
M. Geva
Same Question, Different Words: A Latent Adversarial Framework for Prompt Robustness
T. Fu
,
F. Barez
Trust Me, I'm Wrong: High-Certainty Hallucinations in LLMs
A. Simhi
,
I. Itzhak
,
F. Barez
,
G. Stanovsky
,
Y. Belinkov
Interpreting Context Look-ups in Transformers: Investigating Attention-MLP Interactions
C. Neo*
,
S. B. Cohen
,
F. Barez*
Towards Interpretable Sequence Continuation: Analyzing Shared Circuits in Large Language Models
M. Lan
,
P. Torr
,
F. Barez
Large Language Models Relearn Removed Concepts
M. Lo*
,
S. B. Cohen
,
F. Barez*
Detecting Edit Failures in Large Language Models: An Improved Specificity Benchmark
J. Hoelscher-Obermaier*
,
J. Persson*
,
E. Kran
,
I. Konstas
,
F. Barez*
The Larger They Are, the Harder They Fail: Language Models Do Not Recognize Identifier Swaps in Python
A. v. M. Barone*
,
F. Barez*
,
I. Konstas
,
S. B. Cohen
Cite
×