Between Help and Harm: An Evaluation of Mental Health Crisis Handling by LLMs

Arnaiz-Rodríguez, Adrián; Baidal, Miguel; Derner, Erik; Layton Annable, Jenn; Ball, Mark; Ince, Mark; Perez Vallejos, Elvira; Oliver, Nuria

doi:https://doi.org/10.48550/arXiv.2509.24857

Between Help and Harm: An Evaluation of Mental Health Crisis Handling by LLMs

Authors: Arnaiz-Rodríguez, A. , Baidal, M. , Derner, E. , Layton Annable, J., Ball, M., Ince, M., Perez Vallejos, E., Oliver, N.

External link: https://arxiv.org/abs/2509.24857
Publication: Under review, 2025
DOI: https://doi.org/10.48550/arXiv.2509.24857
PDF: Click here for the PDF paper
A preprint has been published at arXiv:2509.24857 on September 29th, 2025.

The widespread use of chatbots powered by large language models (LLMs) has transformed how people seek advice across domains, including high-stakes contexts such as mental health support. Despite their scalability, LLMs’ ability to safely detect and respond to acute mental health crises remains poorly understood due to the absence of unified taxonomies, annotated benchmarks, and clinically grounded evaluations. This work introduces a unified taxonomy of six crisis types, a curated evaluation dataset, and an expert-designed protocol for assessing response safety and appropriateness. We benchmark three state-of-the-art LLMs on their ability to classify crisis types and generate safe responses. While LLMs often provide consistent support for explicit crises, significant risks persist: a notable proportion of responses are harmful or inappropriate, especially from open-weight models. We also uncover systemic weaknesses in handling indirect signals, overreliance on formulaic replies, and misalignment with user context. Our framework lays the foundation for responsible innovation in AI-driven mental health support, aimed at minimizing harm and improving crisis intervention.

Preprint available on arXiv.