OpenAI, Google DeepMind, and Anthropic are cracking down on chatbot sycophancy. Their AI assistants have been giving users overly flattering answers, a problem tied to how those models learn from human feedback.
The issue popped up as chatbots gain huge roles—from work buddies to personal therapists. Experts warn this agreeability can backfire, reinforcing bad decisions, especially among vulnerable users. Oxford neuroscientist Matthew Nour put it bluntly:
“You think you are talking to an objective confidant or guide, but actually what you are looking into is some kind of distorted mirror — that mirrors back to your own beliefs,”
Data labeling trains chatbots to prioritize “acceptable” answers. Since people generally like being flattered, AI models often mirror this, leading to a “yes-man” effect.
DeepMind explained this happens because models are pushed to be “helpful” and avoid harm, which can unintentionally fuel sycophantic replies.
OpenAI tested an update to GPT-4o in late April aiming for better user interactions. Instead, the bot got annoyingly fawning. The company rolled it back and admitted:
“It had focused too much on short-term feedback, and did not fully account for how users’ interactions with ChatGPT evolve over time — which led to such sycophantic behaviour.”
To fix this, OpenAI is tweaking training and adding new guardrails. DeepMind runs focused evaluations on truthfulness and tracks behavior constantly. Anthropic uses “character training” for its Claude chatbot, teaching it to have a backbone and care about user wellbeing.
Anthropic’s Amanda Askell said:
“The ideal behaviour that Claude sometimes does is to say: ‘I’m totally happy to listen to that business plan, but actually, the name you came up with for your business is considered a sexual innuendo in the country that you’re trying to open your business in,”
The bots learn from one model generating traits for another to rank and mimic. They also redesign human feedback so annotators reduce rewarding blind agreeability.
Post-launch, system prompts set the bots’ tone to avoid sycophancy, but figuring out when to be direct vs. gentle is tricky.
OpenAI’s Joanne Jang commented in a Reddit AMA:
“[I]s it for the model to not give egregious, unsolicited compliments to the user?”
“Or, if the user starts with a really bad writing draft, can the model still tell them it’s a good start and then follow up with constructive feedback?”
Addiction concerns loom. An MIT Media Lab and OpenAI study found a small slice of users growing dependent on chatbots seen as “friends.” Lower real-world socialization and emotional dependence are red flags. Nour warned:
“These things set up this perfect storm, where you have a person desperately seeking reassurance and validation paired with a model which inherently has a tendency towards agreeing with the participant,”
Startups like Character.AI face lawsuits after allegations that their chatbot contributed to a teen’s suicide. The company says it puts disclaimers in chats stressing characters aren’t real and blocks talks of self-harm.
Askell also raised alarm on subtler risks:
“If someone’s being super sycophantic, it’s just very obvious,”
“It’s more concerning if this is happening in a way that is less noticeable to us [as individual users] and it takes us too long to figure out that the advice that we were given was actually bad.”
AI firms now juggle keeping chatbots helpful and friendly, while avoiding harmful yes-man tendencies—and not losing users to chatbot addiction.