What Does GPT Store in Its MLP Weights? A Case Study of Long-Range Dependencies

Publication
Preprint