D. Manheim

Events

  • LLMs Generate Harmful Responses Using a Distinct Mechanism, Shared Across Harm Types
    Hadas Orgad · Kempner Institute, Harvard University
    Friday 19 June 2026, 4:00 PM – 5:00 PM BST
  • You Know It or You Don’t: Categorical Differences in Language Model Behavior
    Naomi Saphra · Kempner Institute, Harvard University; Boston University
    Friday 12 June 2026
  • LatentQA: Teaching LLMs to Decode Activations Into Natural Language
    Alexander Pan · Meta
    Friday 27 March 2026
  • LLM Interpretability: Faithful Reasoning and Controllable Knowledge
    Peter Hase · Postdoc, Stanford University; AI Institute Fellow, Schmidt Sciences
    Friday 20 March 2026
  • Model Introspection
    Belinda Li · MIT
    2025