Detecting Edit Failures in Large Language Models: An Improved Specificity Benchmark

Publication
ACL 2023