Research Areas

TSG Lab works across four interconnected areas, united by a single question: how do we make powerful AI systems legible, reliable, and accountable? Our work moves from the internals of models to the institutions that govern them. We publish at NeurIPS, ICML, ICLR, ACL, EMNLP, FAccT, and other leading venues.

Interpretability

Most AI systems that affect people’s lives cannot explain their own reasoning. We study the internal structure of neural networks — identifying which components drive particular behaviours, diagnosing failure modes, and developing practical tools for understanding what a model is actually doing.

Safety & Alignment

A model can pass every benchmark and still behave in ways we did not intend. Behaviours can persist after fine-tuning, outputs can appear confident without being grounded, and evaluations can miss the very things they were designed to catch. We study how these gaps arise and develop methods to detect and close them.

Technical Governance

Interpretability findings are only useful if they can travel beyond the lab. We develop structured, auditable methods that translate technical insights into evidence regulators, developers, and policymakers can act on — covering safety cases, evaluation standards, and accountability mechanisms.

Societal Impact

Capable AI systems reshape how decisions are made and by whom. We study how AI deployment affects individual agency, institutional power, and the broader social fabric — and what combinations of technical design and policy can preserve meaningful human oversight.