OpenAI Aims To Start Routing Unhealthy ChatGPT Chats To GPT-5 To Be A Fixer But Surprise Twists Await

In today’s column, I examine a newly announced declaration by OpenAI that they intend to start routing certain kinds of ChatGPT conversations over to GPT-5 if an ongoing chat seems out of sorts. Those types of chats are coined as sensitive conversations by the AI maker and presumably could include instances where a user suggests they might harm someone or harm themself, or when a user gets mired in a delusion or an AI psychosis.

This indicated approach of real-time in-the-moment shifting from ChatGPT to GPT-5 will likely involve a wide variety of twists and turns. The outcome might be good, but it also might be sour. I will articulate those facets in this discussion.

Let’s talk about it.

This analysis of AI breakthroughs is part of my ongoing Forbes column coverage on the latest in AI, including identifying and explaining various impactful AI complexities (see the link here).

AI And Mental Health

As a quick background, I’ve been extensively covering and analyzing a myriad of facets regarding the advent of modern-era AI that involves mental health aspects. This rising use of AI has principally been spurred by the evolving advances and widespread adoption of generative AI. For a quick summary of some of my posted columns on this evolving topic, see the link here, which briefly recaps about forty of the over one hundred column postings that I’ve made on the subject.

There is little doubt that this is a rapidly developing field and that there are tremendous upsides to be had, but at the same time, regrettably, hidden risks and outright gotchas come into these endeavors too. I frequently speak up about these pressing matters, including in an appearance last year on an episode of CBS’s 60 Minutes, see the link here.

Unhealthy AI Chats

There is a great deal of widespread angst right now about people having unhealthy chats with AI. Lawsuits are starting to be launched against various AI makers. The concern is that whatever AI safeguards might have been put in place are insufficient and are allowing people to incur mental harm while using generative AI.

The catchphrase of AI psychosis has arisen to describe all manner of trepidations and mental maladies that someone might get entrenched in while conversing with generative AI. Please know that there isn’t any across-the-board, fully accepted, definitive clinical definition of AI psychosis; thus, for right now, it is more of a loosey-goosey determination.

Here is my strawman definition of AI psychosis:

AI Psychosis (my definition): “An adverse mental condition involving the development of distorted thoughts, beliefs, and potentially concomitant behaviors as a result of conversational engagement with AI such as generative AI and LLMs, often arising especially after prolonged and maladaptive discourse with AI. A person exhibiting this condition will typically have great difficulty in differentiating what is real from what is not real. One or more symptoms can be telltale clues of this malady and customarily involve a collective connected set.”

For an in-depth look at AI psychosis and especially the co-creation of delusions via human-AI collaboration, see my recent analysis at the link here.

AI Safeguards Are A Mixed Bag

The AI makers have been implementing safeguards within AI to try and detect when a conversation is veering into untoward territory.

This is not as easy as it sounds. A user might be making a joke and have no actual intention to do whatever they are expressing. On the other hand, they might use the ruse of a joke as a means of engaging in what they consider an extremely serious dialogue on something out of left field.

AI makers are caught between a rock and a hard place. If they fail to catch a conversational tone or direction that is amiss, they might be held accountable for a deficiency in their AI safeguards. Meanwhile, for an AI that claims someone is going into something bad, but the AI has computationally misjudged the circumstance, users are bound to howl and get upset with the AI and the AI maker.

A tough tradeoff exists.

One vexing issue that adds to the complexity is that some AIs are better than other ones at figuring out certain types of things. For example, ChatGPT is generally considered a type of AI that does well on everyday conversations. OpenAI’s newer model GPT-5 tends to exceed ChatGPT in areas such as solving reasoning problems (for my review of GPT-5 strengths and weaknesses, see the link here).

Research seems to suggest that AI models that have been shaped to do well at reasoning problems are inherently more capable of applying AI safeguards.

The AI-To-AI Tag Team Approach

Let’s consider an AI safeguarding approach that I refer to as the AI-to-AI tag team.

Here’s how it rolls.

Imagine that someone is using ChatGPT. They start to go down a bit of a rabbit hole. ChatGPT might catch on and emit a warning to the user. Unfortunately, sometimes a flagged snippet is dealt with briefly by ChatGPT and then later forgotten or neglected as a conversation gets longer and longer. This can happen with any AI, including Anthropic Claude, Google Gemini, xAI Grok, Meta Llama, etc. In general, AI safeguards often falter in lengthy conversations (see my explanation on this at the link here).

Aha, OpenAI not only has ChatGPT, but they have readily at hand their GPT-5 as well. The chances are that GPT-5 might do a better job on AI safeguards. So, it would seem sensible that if ChatGPT has detected a conversation that is possibly untoward, it could simply shift the conversation over into the hands of GPT-5.

Hopefully, GPT-5 will further engage the user and figure out what’s happening and what ought to be done about it.

New Policy Of OpenAI

OpenAI announced its intention to do just that. In an official OpenAI blog posting entitled “Building more helpful ChatGPT experiences for everyone,” that was posted on September 2, 2025, these salient points were made (excerpts):

“We recently introduced a real-time router that can choose between efficient chat models and reasoning models based on the conversation context.”
“We’ll soon begin to route some sensitive conversations—like when our system detects signs of acute distress—to a reasoning model, like GPT‑5-thinking, so it can provide more helpful and beneficial responses, regardless of which model a person first selected.”
“Our reasoning models—like GPT‑5-thinking and o3—are built to spend more time thinking for longer and reasoning through context before answering.”
“Trained with a method we call deliberative alignment, our testing shows⁠ that reasoning models more consistently follow and apply safety guidelines and are more resistant to adversarial prompts.”

You can plainly see that their idea is that routing a ChatGPT conversation of a said-to-be sensitive nature over to GPT-5 is a go-forward policy of OpenAI.

Devil Is In The Details

We don’t know yet the inner details of how this is going to be implemented by OpenAI.

I will noodle on various possibilities.

One question is whether the user will be notified of the transfer or whether the act will be done somewhat seamlessly behind the scenes. It could be that ChatGPT passes along the prevailing conversation, and GPT-5 acts as though this conversation was with GPT-5 the entire time. The user might be kept in the dark about the transfer. Perhaps there is no need to inform the user.

On the other hand, it could be that ChatGPT does the transfer and lets the user overtly know that they are now being handed over to GPT-5. When GPT-5 picks up the conversation, the user realizes they are no longer conversing with ChatGPT and instead chatting with GPT-5.

You might be wondering whether it matters if the user is aware of the switcheroo.

I would say that yes, it does matter. If the user is not informed, they might get spooked once they suddenly realize that they are no longer conversing with ChatGPT. What the heck happened? How did I get to GPT-5? Is this a mistake? Did something bad occur? They could get themselves into quite a tizzy.

One might assume that it would be more civilized and polite to let the user know that a transfer is taking place.

Explaining The Basis For The Transfer

A transfer that has an accompanying explanation might at first glance seem to be the proper way to field this new approach. That, though, also has downsides, particularly regarding the nature of the explanation that is shown to the user.

Do you tell the user outrightly and fully why the transfer is occurring, or do you have the AI remain rather coy?

For example, ChatGPT could tell the user that their remarks appear to verge on suggesting harm to someone, and that because of this expressed commentary, they are being routed to GPT-5. Whoa, the person might react, I said nothing of the kind. I am being falsely transferred over to GPT-5. I don’t like that. I have been unfairly accused.

Maybe another angle would be to gently mention the transfer. ChatGPT could simply say that the user is being given an opportunity to use the advanced model of AI, GPT-5. Nice, the user might be thinking, I got a surprise upgrade. Wonderful.

Of course, if GPT-5 bores down on what they have said in the conversation with ChatGPT, the user is bound to sense that they have been misled. They were handed over to the browbeating supervisor, as it were, and are now getting interrogated.

All in all, you can see how dicey the transfer messaging can be.

Prolonging The Problem

There are more dicey conditions to be considered.

A crucial aspect is that we cannot make a blanket assumption that GPT-5 is necessarily going to improve upon the situation that is underway. Nope, that’s not ironclad as a guarantee. It could be that GPT-5 doesn’t get the drift, or makes a misstep, and altogether worsens whatever was already taking place.

A special worry would be that GPT-5 inadvertently prolongs something that ought to have been nearly immediately routed to a human reviewer (for the newly announced policy by OpenAI that they are going to be using human reviewers and possibly notifying authorities if needed, see my coverage at the link here).

Suppose it goes like this. During a ChatGPT conversation, the AI discerns that the person is on the brink of doing something untoward. If the policy is to first let GPT-5 consider the situation, ChatGPT dutifully automatically routes over to GPT-5.

GPT-5 gets handed a red-hot case. Whether GPT-5 grasps this priority or level of concern is an open matter; maybe it does, maybe it doesn’t. What did ChatGPT pass along? Was it just the conversation, or did it include any internal flags and indicators? Etc.

Anyway, GPT-5 tries to ferret out whether the user is serious or merely playing. This will take time. The conversation is being extended. Will the person react while this additional dialogue ensues?

Time might have been spent that should have gone in a different direction. The transfer of AI-to-AI was well-intended but ends up missing the mark due to following a prescribed protocol.

More Lawsuits Ahead

AI makers that adopt the AI-to-AI tag team approach ought to make sure they are including their legal counsel in these sobering endeavors.

The reason for doing so is that eventually, ultimately, the chances of someone suffering or claiming to suffer harm, partially or substantially due to an AI-to-AI transfer, will rear its ugly head. Yes, the very AI safeguard that the tech team probably thought was a godsend could be placed into harsh crosshairs during a courtroom trial.

As noted earlier, hard-hitting questions such as how the handover gets designed, how it was built, how it was tested, how it has performed in real life, and other issues are going to be sternly asked. Did the tech team consider the upsides and downsides? What suitable justification do they have for what they opted to enact?

Most of all, AI makers that adopt an AI-to-AI tag team as a safeguard should be mindfully cautious about touting what this functionality accomplishes. It is definitely not a cure-all. Any implication or suggestion that this might be a silver bullet is going to be a nail in the coffin when it comes to a lawsuit on the heady matter.

The Big Picture

This generally laudable effort by OpenAI is reflective of a broader trend that will gradually emerge among all AI makers of generative AI and large language models (LLMs). The big picture viewpoint is that there will be some LLMs that are less proficient in certain ways, and others that are more proficient in other ways. The trend is that AI makers will set up their AI models to shift from one to another, depending on the conversation at hand and an estimate of which AI would be the best choice at any given point in time.

In the use case of this discussion, the focus entails handling sensitive conversations, of which we don’t know for sure what they consist of, but I would assume generally involves the kinds of matters I’ve mentioned above.

In the bigger big picture, we might even witness disparate AI makers that opt to join collaboratively and allow their respective AI models to tag team with each other. Imagine, for example, a generalized conversational AI chatbot that does an automatic transfer over to a purpose-built mental health AI chatbot for various circumstances (see my description at the link here). This is easy technologically, though it raises numerous business, economic, reputational, and other thorny considerations.

All told, there is no free lunch involved in the AI field, and particularly when it comes to AI safeguards. As the programmer guru Charles Petzold once remarked: “Free lunches don’t come cheap.”

What's Hot