Do Sparse Autoencoders Generalize? A Case Study of Answerability

Publication
ICML 2025 Workshop