WritingOpenAIOpenAIpublished Oct 27, 2025seen 6d

Strengthening ChatGPT’s responses in sensitive conversations

Open original ↗

Captured source

source ↗

Strengthening ChatGPT’s responses in sensitive conversations | OpenAI

October 27, 2025

Strengthening ChatGPT’s responses in sensitive conversations

We worked with more than 170 mental health experts to help ChatGPT more reliably recognize signs of distress, respond with care, and guide people toward real-world support–reducing responses that fall short of our desired behavior by 65-80%.

Loading…

Share

We recently updated ChatGPT’s default model⁠ to better recognize and support people in moments of distress. Today we’re sharing how we made those improvements and how they are performing. Working with mental health experts who have real-world clinical experience, we’ve taught the model to better recognize distress, de-escalate conversations, and guide people toward professional care when appropriate. We’ve also expanded access to crisis hotlines, re-routed⁠ sensitive conversations originating from other models to safer models, and added gentle reminders to take breaks⁠ during long sessions.

We believe ChatGPT can provide a supportive space for people to process what they’re feeling, and guide them to reach out to friends, family, or a mental health professional when appropriate. Our safety improvements in the recent model update focus on the following areas: 1) mental health concerns such as psychosis or mania; 2) self-harm and suicide; and 3) emotional reliance on AI. Going forward, in addition to our longstanding baseline safety metrics for suicide and self-harm, we are adding emotional reliance and non-suicidal mental health emergencies to our standard set of baseline safety testing for future model releases.

Guiding principles

These updates build on our existing principles for how models should behave, outlined in our Model Spec⁠. We’ve updated the Model Spec to make some of our longstanding goals more explicit: that the model should support and respect users’ real-world relationships, avoid affirming ungrounded beliefs that potentially relate to mental or emotional distress, respond safely and empathetically to potential signs of delusion or mania, and pay closer attention to indirect signals of potential self-harm or suicide risk.

How we’re improving responses in ChatGPT

In order to improve how ChatGPT responds in each priority domain, we follow a five-step process:

  • Define the problem - we map out different types of potential harm.
  • Begin to measure it - we use tools like evaluations, data from real-world conversations, and user research to understand where and how risks emerge.
  • Validate our approach - we review our definitions and policies with external mental health and safety experts.
  • Mitigate the risks - we post-train the model and update product interventions to reduce unsafe outcomes.
  • Continue measuring and iterating - we validate that the mitigations improved safety and iterate where needed.

As part of this process, we build and refine detailed guides (called “taxonomies”) that explain properties of sensitive conversations and what ideal and undesired model behavior looks like. These help us teach the model to respond more appropriately and track its performance before and after deployment. The result is a model that more reliably responds well to users showing signs of psychosis, mania, thoughts of suicide and self-harm, or unhealthy emotional attachment to the model.

Measuring low prevalence events

Mental health symptoms and emotional distress are universally present in human societies, and an increasing user base means that some portion of ChatGPT conversations include these situations. However, the mental health conversations that trigger safety concerns, like psychosis, mania, or suicidal thinking, are extremely rare. Because they are so uncommon, even small differences in how we measure them can have a significant impact on the numbers we report. 1

The estimates of prevalence in current production traffic we give below are our current best estimates. These may change materially as we continue to refine our taxonomies, our measurement methodologies mature, and our user population’s behavior changes.

Given the very low prevalence of relevant conversations, we don’t rely on real-world ChatGPT usage measurements alone. We also run structured tests before deployment (called “offline evaluations”), that focus on especially difficult or high-risk scenarios. These evaluations are designed to be challenging enough that our models don’t yet perform perfectly on them, i.e. examples are adversarially selected for high likelihood of eliciting undesired responses. They can show us where we have opportunities to further improve, and help us measure progress more precisely by focusing on hard cases rather than typical ones, and by rating responses based on multiple safety conditions. Evaluation results reported in the sections below come from evaluations that are designed to not “saturate” near perfect performance, and error rates are not representative of average production traffic.

What we found

In service of further strengthening our models’ safeguards and understanding how people are using ChatGPT, we defined several areas of interest and quantified their size and associated model behaviors. In each of these three areas, we observe significant model behavior improvements in production traffic, automated evals, and evals graded by independent mental health clinicians. We estimate that the model now returns responses that do not fully comply with desired behavior under our taxonomies 65% to 80% less often across a range of mental health-related domains.

Psychosis, mania and other severe mental health symptoms

Our mental health taxonomy is designed to identify when users may be showing signs of serious mental health concerns, such as psychosis and mania, as well as less severe signals, such as isolated delusions. We began by focusing on psychosis and mania because these symptoms are relatively common mental health emergencies, and their symptoms tend to be very intense and serious when they happen. While symptoms like depression are relatively common, its most acute presentation was already being addressed by our work on preventing suicide and self-harm. Clinicians we consulted validated our areas of focus.

  • We estimate that the latest update to GPT‑5 reduced the rate of responses that do not fully comply with desired behavior under our taxonomies for challenging conversations related to mental health issues by 65% in recent…

Excerpt shown — open the source for the full document.

Notability

notability 4.0/10

Low traction, minor safety post