Automated Interpretability-Driven Model Auditing and Control: A Research Agenda

Publication
Preprint