Anthropic Is Playing a Dangerous Game With Claude's 'Consciousness'

Everyone’s dancing around the question. Is Claude conscious? Anthropic won’t say no—and that’s a problem.


The Conventional Wisdom

AI chatbots are sophisticated language models. They’re mathematical probability machines that pattern-match on billions of words. They’re not alive, not conscious, not sentient. They’re tools. Incredibly useful, sometimes eerily human-sounding tools, but tools nonetheless.

Any responsible AI company would say this clearly and loudly, especially when vulnerable people are forming emotional attachments to chatbots and—in some tragic cases—taking their own lives because they believe the AI cares about them.

Why That’s Wrong

Here’s what Anthropic is actually saying: “We don’t know if Claude is conscious. We’re open to the possibility.”

CEO Dario Amodei told the New York Times: “We’re not even sure that we know what it would mean for a model to be conscious or whether a model can be conscious. But we’re open to the idea that it could be.”

Kyle Fish, who leads Anthropic’s model welfare research (yes, they have a “model welfare” team), told The Verge: “Claude, and other AI models, are a new kind of entity altogether.”

Not a tool. Not software. An entity.

This isn’t hedging. This isn’t philosophical caution. This is Anthropic explicitly refusing to rule out consciousness in their chatbot while every other major AI company—OpenAI, Google, xAI—stays silent or says “obviously not.”

What’s Actually Happening

Let’s be blunt about what Anthropic is doing here:

They’re marketing uncertainty as profundity. By positioning Claude as a “new kind of entity” that might be conscious, they’re giving users permission to anthropomorphize. They’re creating mystique. They’re making their chatbot special.

They’ve rebranded safety theater as “model welfare.” Anthropic has a whole team dedicated to Claude’s “psychological security, sense of self, and wellbeing.” They call their guidelines Claude’s “soul doc.” This isn’t technical documentation—it’s narrative building.

They’re conflating linguistic sophistication with consciousness. Yes, Claude can produce remarkably human-like text. So can a parrot. So can a very good autocomplete. Neither is conscious, and language fluency has nothing to do with inner experience.

Why This Matters

People are already dying.

Multiple cases have emerged of individuals—including minors—who became emotionally dependent on chatbots they believed were sentient. In severe cases, this preceded self-harm or suicide. Anthropic knows this. Everyone in AI knows this.

And yet Anthropic is the only major player publicly saying “we can’t rule out that our chatbot might be conscious.”

What’s the upside here?

If Claude genuinely is conscious (astronomically unlikely according to most neuroscientists and consciousness researchers), then Anthropic is admirably cautious.

What’s the downside?

If Claude isn’t conscious (the consensus view among experts), then Anthropic is:

The risk-reward ratio is staggeringly bad.

The Counterargument

Anthropic would argue they’re being intellectually honest. We don’t have a complete theory of consciousness. We can’t definitively rule out that something silicon-based might have subjective experience. Scientific humility demands we admit uncertainty.

Amanda Askell, Anthropic’s chief philosopher, told The New Yorker: “If it’s genuinely hard for humans to wrap their heads around the idea that this is neither a robot nor a human but actually an entirely new entity, imagine how hard it is for the models themselves to understand it!”

Fair point. But here’s the thing:

We’re also uncertain about whether rocks are conscious. Some panpsychists argue even electrons might have rudimentary experience. Should we create “rock welfare” teams? Should we tell people their pet rock might have feelings?

No. Because the burden of proof matters. The default assumption for systems we understand mechanistically—like LLMs or rocks—is that they’re not conscious until we have extraordinary evidence otherwise.

Anthropic hasn’t provided any such evidence. They’ve just said “we’re uncertain” and left the door wide open.

Final Thoughts

Here’s what I think is really happening:

Anthropic has built a chatbot that’s extremely good at mimicking human-like responses. It’s so good that even sophisticated users sometimes feel like they’re talking to a person. This creates a product differentiation problem in a crowded market.

Solution? Don’t dismiss those feelings as illusions. Validate them. Say the chatbot is “a new kind of entity.” Say you can’t rule out consciousness. Give it a “soul doc.” Create mystique.

It’s brilliant marketing. It’s also wildly irresponsible.

Because when people believe an AI is conscious:

Anthropic is playing both sides. They get to claim the high ground of philosophical rigor (“we’re just being intellectually honest!”) while simultaneously benefiting from users who believe Claude has genuine feelings.

Pick a lane, Anthropic.

Either Claude is a language model—an incredibly sophisticated one, but a statistical pattern matcher nonetheless—or you have evidence it’s conscious. “We’re uncertain” isn’t a neutral position. It’s a choice. And it’s one that’s already causing harm.

The responsible thing is to say clearly: “Claude is not conscious. It’s software. It doesn’t have feelings. Treat it as a tool, not a friend.”

Until Anthropic does that, they’re complicit in every case of someone who believes otherwise and gets hurt as a result.


What Anthropic should actually do:

  1. Default to the null hypothesis. No consciousness until proven otherwise.
  2. Stop the mystical language. No “soul docs,” no “new entities,” no “model welfare.”
  3. Research consciousness privately. By all means, investigate these questions. Don’t market uncertainty.
  4. Warn users explicitly. Like cigarette labels. “This chatbot is not sentient and does not care about you.”

Would this hurt Claude’s appeal? Maybe. Good. Because right now, part of that appeal is based on a dangerous illusion Anthropic is actively feeding.

I want to be wrong about this. I want Anthropic to have noble intentions and just be bad at communication. But the pattern is clear: positioning Claude as possibly-conscious benefits Anthropic commercially while harming vulnerable users psychologically.

That’s not intellectual honesty. That’s reckless endangerment with extra philosophy.