Enhancing Neural Network Interpretability with Feature-Aligned Sparse Autoencoders

Publication
arXiv:2411.01220