AI News

Here’s What Anthropic Philosopher Says About Claude’s Security, Character & AI Identity Crisis

Anthropic to Invest

Key Highlights:

  • Anthropic has a philosopher whose job is to help Claude act like a good, kind person.
  • Some newer versions of Claude sometimes act nervous or too hard on themselves, and the team is trying to fix that.
  • Anthropic wants Claude to feel like a smart friend you can talk to, not like a therapist or someone giving deep emotional advice.

If you have ever wondered how AI behaves, remains ethical, and interacts in an emotional tone, we may finally have answers, all thanks to Anthropic’s in-house philosopher, Amanda Askell. I know you must be wondering, is that for real? Yes, they have one. In a video/podcast published by Anthropic today, Askell offers a rare look into Claude’s behavior. First, let me jump into why Anthropic has a philosopher in the first place and what she does. 

Well, Askell isn’t focused on model accuracy or tokens or GPUs. Her job, as she says, is to help steer Claude toward behaving like “the ideal person” in any situation. That means mixing philosophical theory, psychology, and engineering into something that can actually operate at scale. And as AI becomes more capable, the work is getting harder and harder.

Teaching an AI to be moral isn’t as simple as it may sound

It’s worth noting that philosophers love debating theories, but that doesn’t help much when a model has to make real-world decisions in milliseconds. Askell explains that the team often aims for something they call “superhumanly moral decisions.” The model is provided with enough structure and context so that experts looking back months later would say, “Yeah, that was the right call.” The idea is to teach AI to reason with wisdom, which it doesn’t fully have yet. And, apparently that is where the problem begins. 

Askell says that older Claude models, for example Opus 3, appeared to be more psychologically secure. They didn’t spiral when criticized and didn’t sound worried. But, newer models sometimes have that behaviour. According to the philosopher, some versions of Claude slip into criticism spirals, and sound overly self-critical or nervous because they’ve absorbed so much about model flaws from the internet. They even start predicting that humans will be disappointed before a human even says anything.

Anthropic now treats this as a real priority, and wants future models to feel more grounded and less anxious. That’s not because they “feel” in the human sense, but because insecure behavior affects user trust. Researchers still don’t know if an AI’s “personality” comes from what it already learned or from what people tell it to do. Interestingly, the team is actively mulling over the idea of AI welfare. And, that’s not because the company thinks Claude is conscious, but because if a system acts human-ish enough, it’s safer and more respectful to treat it kindly.

My understanding is that the cost to humans is less significant, but if we get it wrong, the consequences can be huge.

Anthropic wants Claude to act as a smart, knowledgeable friend, and not a therapist

Anthropic wants Claude to help people, and internally there are always discussions around it. The company wants the AI to act as a knowledgeable, anonymous listening partner, to  whom people can talk their heart out without having the feeling of getting judged. But they don’t want Claude to sound like a therapist or give the impression of a relationship, which creates false expectations and legal headaches.

The philosopher described the work as sitting in an era where AI feels “strange, novel, and increasingly unpredictable.” To guide models like Claude, Anthropic uses carefully crafted system prompts. It’s experimental, messy, and sometimes surreal.

So, what do you think about AI welfare and Anthropic’s aim for Claude? We’d love to hear your thoughts in the comments below. 

Rishaj Upadhyay
Rishaj is a tech journalist with a passion for AI, Android, Windows, and all things tech. He enjoys breaking down complex topics into stories readers can relate to. When he's not breaking the keyboard, you can find him on his favorite subreddits, or listening to music/podcasts
You may also like
More in:AI News