Quantifying the Effect of Test Set Contamination on Generative Evaluations

Publication
arXiv:2601.04301