Research
Publications
People
Media
Events
Vacancies
Contact
Paper-Conference
Context Matters: Analyzing the Generalizability of Linear Probing and Steering Across Diverse Scenarios
I. Agarwal
,
S. Navani
,
F. Barez
Emerging Risks from Embodied AI Require Urgent Policy Action
J. Perlo
,
A. Robey
,
F. Barez
,
J. Mökander
Establishing Best Practices for Building Rigorous Agentic Benchmarks
Y. Zhu
,
T. Jin
,
Y. Pruksachatkun
,
A. Zhang
,
S. Liu
,
S. Cui
,
S. Kapoor
,
F. Barez
,
Et Al.
Full-Stack Alignment: Co-Aligning AI and Institutions with Thicker Models of Value
R. Lowe
,
J. Edelman
,
T. Zhi-Xuan
,
O. Klingefjord
,
E. Hain
,
V. Wang
,
A. Sarkar
,
F. Barez
,
Et Al.
Beyond Linear Steering: Unified Multi-Attribute Control for Language Models
N. Oozeer
,
L. Marks
,
F. Barez
,
A. Abdullah
Precise In-Parameter Concept Erasure in Large Language Models
Y. Gur-Arieh
,
C. Suslik
,
Y. Hong
,
F. Barez
,
M. Geva
Same Question, Different Words: A Latent Adversarial Framework for Prompt Robustness
T. Fu
,
F. Barez
Trust Me, I'm Wrong: High-Certainty Hallucinations in LLMs
A. Simhi
,
I. Itzhak
,
F. Barez
,
G. Stanovsky
,
Y. Belinkov
Rethinking Safety in LLM Fine-Tuning: An Optimization Perspective
M. Kim
,
J. M. Kwak
,
L. Alssum
,
B. Ghanem
,
P. Torr
,
D. Krueger
,
F. Barez†
,
A. Bibi†
Do Sparse Autoencoders Generalize? A Case Study of Answerability
L. Heindrich
,
P. Torr
,
F. Barez
,
V. Thost
»
Cite
×