D. Manheim

Upcoming

LLMs Generate Harmful Responses Using a Distinct Mechanism, Shared Across Harm Types
Hadas Orgad · Kempner Institute, Harvard University
Friday 19 June 2026, 4:00 PM – 5:00 PM BST

Past

You Know It or You Don’t: Categorical Differences in Language Model Behavior
Naomi Saphra · Kempner Institute, Harvard University; Boston University
Friday 12 June 2026
LatentQA: Teaching LLMs to Decode Activations Into Natural Language
Alexander Pan · Meta
Friday 27 March 2026
LLM Interpretability: Faithful Reasoning and Controllable Knowledge
Peter Hase · Postdoc, Stanford University; AI Institute Fellow, Schmidt Sciences
Friday 20 March 2026
Model Introspection
Belinda Li · MIT
2025