Beyond Linear Probes: Dynamic Safety Monitoring for Language Models

Publication
arXiv:2509.26238