Research
Publications
People
Media
Events
Vacancies
Contact
Article
AutoControl Arena: Synthesizing Executable Test Environments for Frontier AI Risk Evaluation
C. Li
,
P. Lu
,
X. Pan
,
F. Barez
,
M. Yang
Old Habits Die Hard: How Conversational History Geometrically Traps LLMs
A. Simhi
,
F. Barez
,
M. Tutek
,
Y. Belinkov
,
S. B. Cohen
Token Taxes: Mitigating AGI's Economic Risks
L. Irwin
,
T.-Y. Wu
,
F. Barez
Same Answer, Different Representations: Hidden Instability in VLMs
F. A. Wani
,
A. Suglia
,
R. Saxena
,
A. P. Gema
,
W. C. Kwan
,
F. Barez
,
Et Al.
The Hitchhiker's Guide to Actionable Interpretability
H. Orgad
,
F. Barez
,
T. Haklay
,
I. Lee
,
M. Mosbach
,
A. Reusch
,
N. Saphra
,
Et Al.
Automated Interpretability-Driven Model Auditing and Control: A Research Agenda
F. Barez
Interpretability Can Be Actionable
H. Orgad
,
F. Barez
,
T. Haklay
,
I. Lee
,
M. Mosbach
,
A. Reusch
,
N. Saphra
,
Et Al.
Quantifying the Effect of Test Set Contamination on Generative Evaluations
R. Schaeffer
,
J. Kazdan
,
B. Abbasi
,
K. Z. Liu
,
B. Miranda
,
A. Ahmed
,
F. Barez
,
Et Al.
The Capability Frontier: Benchmarks Miss 82% of Model Performance
B. Fowler
,
R. Smith
,
D. T. Graviet
,
W. Myers
,
J. Greaves
,
N. F. Oozeer
,
A. García
,
Et Al.
When AI Systems Learn During Deployment, Our Safety Evaluations Break
F. Barez
»
Cite
×