LLMs Generate Harmful Responses Using a Distinct Mechanism, Shared Across Harm Types Hadas Orgad · Kempner Institute, Harvard University Friday 19 June 2026, 4:00 PM – 5:00 PM BST
Past
You Know It or You Don’t: Categorical Differences in Language Model Behavior Naomi Saphra · Kempner Institute, Harvard University; Boston University Friday 12 June 2026
LatentQA: Teaching LLMs to Decode Activations Into Natural Language Alexander Pan · Meta Friday 27 March 2026
LLM Interpretability: Faithful Reasoning and Controllable Knowledge Peter Hase · Postdoc, Stanford University; AI Institute Fellow, Schmidt Sciences Friday 20 March 2026