Research
Publications
People
Media
Events
Vacancies
Contact
Interpretability
Mechanistic Interpretability Workshop at ICML 2024
F. Barez
,
M. Geva
,
L. Chan
,
A. Geiger
,
K. Yin
,
N. Nanda
,
Et Al.
The Scaling Behavior of Large Language Models
A. v. Miceli-Barone
,
F. Barez
,
S. B. Cohen
,
E. Voita
,
U. Germann
,
M. Lukasik
Visualizing Neural Network Imagination
N. Wichers
,
V. Tao
,
R. Volpato
,
F. Barez
Understanding Addition in Transformers
P. Quirke
,
F. Barez
Increasing Trust in Language Models Through the Reuse of Verified Circuits
P. Quirke
,
C. Neo
,
F. Barez
What Does GPT Store in Its MLP Weights? A Case Study of Long-Range Dependencies
T. Clark
,
S. B. Cohen
,
F. Barez
DeepDecipher: Accessing and Investigating Neuron Activation in Large Language Models
A. Garde
,
E. Kran
,
F. Barez
Detecting Edit Failures in Large Language Models: An Improved Specificity Benchmark
J. Hoelscher-Obermaier*
,
J. Persson*
,
E. Kran
,
I. Konstas
,
F. Barez*
The Larger They Are, the Harder They Fail: Language Models Do Not Recognize Identifier Swaps in Python
A. v. M. Barone*
,
F. Barez*
,
I. Konstas
,
S. B. Cohen
Neuron to Graph: Interpreting Language Model Neurons at Scale
A. Foote*
,
N. Nanda
,
E. Kran
,
I. Konstas
,
S. Cohen
,
F. Barez*
«
»
Cite
×