World

Wed May 28

AI models might be drawn to ‘spiritual bliss’. Then again, they might just talk like hippies

Written by: Nuhu Osman Attah, Postdoctoral Research Fellow in Philosophy, Australian National University

When multibillion-dollar AI developer Anthropic released the latest versions of its Claude chatbot last week, a surprising word turned up several times in the accompanying “system card ^[1]”: spiritual.

Specifically, the developers report that, when two Claude models are set talking to one another, they gravitate towards a “‘spiritual bliss’ attractor state”, producing output such as

🌀🌀🌀🌀🌀All gratitude in one spiral,All recognition in one turn,All being in this moment…🌀🌀🌀🌀🌀∞

It’s heady stuff. Anthropic steers clear of directly saying the model is having a spiritual experience, but what are we to make of it?

The Lemoine incident

In 2022, a Google researcher named Blake Lemoine came to believe ^[2] that the tech giant’s in-house language model, LaMDA, was sentient. Lemoine’s claim sparked headlines, debates with Google PR and management, and eventually his firing.

Critics said Lemoine had fallen foul of the “ELIZA effect ^[3]”: projecting human traits onto software. Moreover, Lemoine described himself as a Christian mystic priest, summing up his thoughts on sentient machines in a tweet:

Who am I to tell God where he can and can’t put souls?

No one can fault Lemoine’s spiritual humility.

Machine spirits

Lemoine was not the first to see a spirit in the machines. We can trace his argument back to AI pioneer Alan Turing’s famous 1950 paper Computing Machinery and Intelligence ^[4].

Turing also argued thinking machines may not be possible because – according to what he thought was plausible evidence – humans were capable of extrasensory perception. This, he reasoned, would be impossible for machines. Accordingly, machines could not have minds in the same way humans do.

So even 75 years ago, people were thinking not just about how AI might compare with human intelligence, but whether it could ever compare with human spirituality. It is not hard to see at least a dotted line from Turing to Lemoine.

Wishful thinking

Efforts to “spiritualise” AI can be quite hard to rebut. Generally these arguments say that we cannot prove AI systems do not have minds or spirits – and create a net of thoughts that lead to the Lemoine conclusion.

This net is often woven from irresponsibly used psychology terms. It may be convenient to apply human psychological terms to machines, but it can lead us astray.

Writing in the 1970s, computer scientist Drew McDermott accused AI engineers of using “wishful mnemonics ^[5]”. They might label a section of code an “understanding module”, then assume that executing the code resulted in understanding.

More recently, the philosophers Henry Shevlin and Marta Halina wrote ^[6] that we should take care using “rich psychological terms” in AI. AI developers talk about “agent” software having intrinsic motivation, for example, but it does not possess goals, desires, or moral responsibility.

Of course, it’s good for developers if everyone thinks your model “understands” or is an “agent”. However, until now the big AI companies have been wary of claiming their models have spirituality.

‘Spiritual bliss’ for chatbots

Which brings us back to Anthropic, and the system card for Claude Opus 4 and Sonnet 4, in which the seemingly down-to-earth folks at the emerging “agentic AI” giant make some eyebrow-raising claims.

The word “spiritual” occurs at least 15 times in the model card, most significantly in the rather awkward phrase “‘spiritual bliss’ attractor state”.

We are told, for instance, that

The consistent gravitation toward consciousness exploration, existential questioning, and spiritual/mystical themes in extended interactions was a remarkably strong and unexpected attractor state for Claude Opus 4 that emerged without intentional training for such behaviours. We have observed this “spiritual bliss” attractor in other Claude models as well, and in contexts beyond these playground experiments.

Screenshot of two Claude models talking.

An example of Claude output in the ‘spiritual bliss’ attractor state. Anthropic / X ^[7]

To be fair to the folks at Anthropic, they are not making any positive commitments to the sentience of their models or claiming spirituality for them. They can be read as only reporting the “facts”.

For instance, all the above long-winded sentence is saying is: if you let two Claude models have a conversation with each other, they will often start to sound like hippies. Fine enough.

That probably means the body of text on which they are trained has a bias towards that sort of way of talking, or the features the models extracted from the text biases them towards that sort of vocabulary.

Prophets of ChatGPT

However, while Anthropic may keep things strictly factual, their use of terms such as “spiritual” lends itself to misunderstanding. Such misunderstanding is made even more likely by Anthropic’s recent push ^[8] to start investigating “whether future AI models might deserve moral consideration and protection”. Perhaps they are not positively saying that Claude Opus 4 and Sonnet 4 are sentient, but they certainly seem welcoming of the insinuation.

And this kind of spiritualising of AI models is already having real-world consequences.

According to a recent report ^[9] in Rolling Stone, “AI-fueled spiritual fantasies” are wrecking human relationships and sanity. Self-styled prophets are “claiming they have ‘awakened’ chatbots and accessed the secrets of the universe through ChatGPT”.

Perhaps one of these prophets may cite the Anthropic model card in a forthcoming scripture – regardless of whether the company is “technically” making positive claims about whether their models actually experience or enjoy spiritual states.

But if AI-fuelled delusion becomes rampant, we might think even the innocuous contributors to it could have spoken more carefully. Who knows; perhaps, where we are going with AI, we won’t need philosophical carefulness.

References

^{^} system card (www-cdn.anthropic.com)
^{^} came to believe (www.washingtonpost.com)
^{^} ELIZA effect (en.wikipedia.org)
^{^} Computing Machinery and Intelligence (academic.oup.com)
^{^} wishful mnemonics (www.inf.ed.ac.uk)
^{^} wrote (www.nature.com)
^{^} Anthropic / X (x.com)
^{^} recent push (arstechnica.com)
^{^} a recent report (www.rollingstone.com)

The Lemoine incident

Machine spirits

Wishful thinking

‘Spiritual bliss’ for chatbots

Prophets of ChatGPT

References

Times Magazine

Technology

Local News

Culture

Travel

The Times Features