I. Itzhak
Events
Upcoming
- LLMs Generate Harmful Responses Using a Distinct Mechanism, Shared Across Harm Types
Hadas Orgad · Kempner Institute, Harvard University
Friday 19 June 2026, 4:00 PM – 5:00 PM BST
Past
- You Know It or You Don’t: Categorical Differences in Language Model Behavior
Naomi Saphra · Kempner Institute, Harvard University; Boston University
Friday 12 June 2026 - LatentQA: Teaching LLMs to Decode Activations Into Natural Language
Alexander Pan · Meta
Friday 27 March 2026 - LLM Interpretability: Faithful Reasoning and Controllable Knowledge
Peter Hase · Postdoc, Stanford University; AI Institute Fellow, Schmidt Sciences
Friday 20 March 2026 - Model Introspection
Belinda Li · MIT
2025