Quantifying Feature Space Universality Across Large Language Models via Sparse Autoencoders

Publication
arXiv:2410.06981