The Times Australia
The Times World News

.

AI systems have learned how to deceive humans. What does that mean for our future?

  • Written by Simon Goldstein, Associate Professor, Dianoia Institute of Philosophy, Australian Catholic University, Australian Catholic University
AI systems have learned how to deceive humans. What does that mean for our future?

Artificial intelligence pioneer Geoffrey Hinton made headlines earlier this year when he raised concerns about the capabilities of AI systems. Speaking to CNN journalist Jake Tapper, Hinton said[1]:

If it gets to be much smarter than us, it will be very good at manipulation because it would have learned that from us. And there are very few examples of a more intelligent thing being controlled by a less intelligent thing.

Anyone who has kept tabs on the latest AI offerings will know these systems are prone to “hallucinating[2]” (making things up) – a flaw that’s inherent in them due to how they work.

Yet Hinton highlights the potential for manipulation as a particularly major concern. This raises the question: can AI systems deceive humans?

We argue[3] a range of systems have already learned to do this – and the risks range from fraud and election tampering, to us losing control over AI.

AI learns to lie

Perhaps the most disturbing example of a deceptive AI is found in Meta’s CICERO[4], an AI model designed to play the alliance-building world conquest game Diplomacy.

Read more: An AI named Cicero can beat humans in Diplomacy, a complex alliance-building game. Here's why that's a big deal[5]

Meta claims it built CICERO to be “largely honest and helpful[6]”, and CICERO would “never intentionally backstab[7]” and attack allies.

To investigate these rosy claims, we looked carefully at Meta’s own game data from the CICERO experiment. On close inspection, Meta’s AI turned out to be a master of deception.

In one example, CICERO engaged in premeditated deception. Playing as France, the AI reached out to Germany (a human player) with a plan to trick England (another human player) into leaving itself open to invasion.

After conspiring with Germany to invade the North Sea, CICERO told England it would defend England if anyone invaded the North Sea. Once England was convinced that France/CICERO was protecting the North Sea, CICERO reported to Germany it was ready to attack.

Playing as France, CICERO plans with Germany to deceive England. Park, Goldstein et al., 2023[8]

This is just one of several examples of CICERO engaging in deceptive behaviour. The AI regularly betrayed other players, and in one case even pretended to be a human with a girlfriend[9].

Besides CICERO, other systems have learned how to bluff in poker[10], how to feint in StarCraft II[11] and how to mislead in simulated economic negotiations[12].

Even large language models (LLM) have displayed significant deceptive capabilities. In one instance, GPT-4 – the most advanced LLM option available to paying ChatGPT users – pretended to be a visually impaired human[13] and convinced a TaskRabbit worker to complete an “I’m not a robot” CAPTCHA for it.

Other LLM models have learned to lie[14] to win social deduction games, wherein players compete to “kill” one another and must convince the group they’re innocent.

Read more: AI to Z: all the terms you need to know to keep up in the AI hype age[15]

What are the risks?

AI systems with deceptive capabilities could be misused in numerous ways, including to commit fraud, tamper with elections and generate propaganda. The potential risks are only limited by the imagination and the technical know-how of malicious individuals.

Beyond that, advanced AI systems can autonomously use deception to escape human control, such as by cheating safety tests imposed on them by developers and regulators.

In one experiment[16], researchers created an artificial life simulator in which an external safety test was designed to eliminate fast-replicating AI agents. Instead, the AI agents learned how to play dead, to disguise their fast replication rates precisely when being evaluated.

Learning deceptive behaviour may not even require explicit intent to deceive. The AI agents in the example above played dead as a result of a goal to survive, rather than a goal to deceive.

In another example, someone tasked AutoGPT (an autonomous AI system based on ChatGPT) with researching tax advisers who were marketing a certain kind of improper tax avoidance scheme. AutoGPT carried out the task, but followed up by deciding on its own to attempt to alert the United Kingdom’s tax authority.

In the future, advanced autonomous AI systems may be prone to manifesting goals unintended by their human programmers.

Throughout history, wealthy actors have used deception to increase their power, such as by lobbying politicians, funding misleading research and finding loopholes in the legal system. Similarly, advanced autonomous AI systems could invest their resources into such time-tested methods to maintain and expand control.

Even humans who are nominally in control of these systems may find themselves systematically deceived and outmanoeuvred.

Close oversight is needed

There’s a clear need to regulate AI systems capable of deception, and the European Union’s AI Act[17] is arguably one of the most useful regulatory frameworks we currently have. It assigns each AI system one of four risk levels: minimal, limited, high and unacceptable.

Systems with unacceptable risk are banned, while high-risk systems are subject to special requirements for risk assessment and mitigation. We argue AI deception poses immense risks to society, and systems capable of this should be treated as “high-risk” or “unacceptable-risk” by default.

Some may say game-playing AIs such as CICERO are benign, but such thinking is short-sighted; capabilities developed for game-playing models can still contribute to the proliferation of deceptive AI products.

Diplomacy – a game pitting players against one another in a quest for world domination – likely wasn’t the best choice for Meta to test whether AI can learn to collaborate with humans. As AI’s capabilities develop, it will become even more important for this kind of research to be subject to close oversight.

References

  1. ^ Hinton said (www.youtube.com)
  2. ^ hallucinating (theconversation.com)
  3. ^ We argue (arxiv.org)
  4. ^ CICERO (www.science.org)
  5. ^ An AI named Cicero can beat humans in Diplomacy, a complex alliance-building game. Here's why that's a big deal (theconversation.com)
  6. ^ largely honest and helpful (www.science.org)
  7. ^ never intentionally backstab (web.archive.org)
  8. ^ Park, Goldstein et al., 2023 (arxiv.org)
  9. ^ with a girlfriend (web.archive.org)
  10. ^ poker (www.cmu.edu)
  11. ^ StarCraft II (www.vox.com)
  12. ^ economic negotiations (arxiv.org)
  13. ^ be a visually impaired human (gizmodo.com)
  14. ^ learned to lie (arxiv.org)
  15. ^ AI to Z: all the terms you need to know to keep up in the AI hype age (theconversation.com)
  16. ^ one experiment (www.youtube.com)
  17. ^ European Union’s AI Act (www.europarl.europa.eu)

Read more https://theconversation.com/ai-systems-have-learned-how-to-deceive-humans-what-does-that-mean-for-our-future-212197

Times Magazine

What AI Adoption Means for the Future of Workplace Risk Management

Image by freepik As industrial operations become more complex and fast-paced, the risks faced by workers and employers alike continue to grow. Traditional safety models—reliant on manual oversight, reactive investigations, and standardised checklist...

From Beach Bops to Alpine Anthems: Your Sonos Survival Guide for a Long Weekend Escape

Alright, fellow adventurers and relaxation enthusiasts! So, you've packed your bags, charged your devices, and mentally prepared for that glorious King's Birthday long weekend. But hold on, are you really ready? Because a true long weekend warrior kn...

Effective Commercial Pest Control Solutions for a Safer Workplace

Keeping a workplace clean, safe, and free from pests is essential for maintaining productivity, protecting employee health, and upholding a company's reputation. Pests pose health risks, can cause structural damage, and can lead to serious legal an...

The Science Behind Reverse Osmosis and Why It Matters

What is reverse osmosis? Reverse osmosis (RO) is a water purification process that removes contaminants by forcing water through a semi-permeable membrane. This membrane allows only water molecules to pass through while blocking impurities such as...

Foodbank Queensland celebrates local hero for National Volunteer Week

Stephen Carey is a bit bananas.   He splits his time between his insurance broker business, caring for his young family, and volunteering for Foodbank Queensland one day a week. He’s even run the Bridge to Brisbane in a banana suit to raise mon...

Senior of the Year Nominations Open

The Allan Labor Government is encouraging all Victorians to recognise the valuable contributions of older members of our community by nominating them for the 2025 Victorian Senior of the Year Awards.  Minister for Ageing Ingrid Stitt today annou...

The Times Features

Meal Prep as Self-Care? The One Small Habit That Could Improve Your Mood, Focus & Confidence

What if the secret to feeling calmer, more focused, and emotionally resilient wasn’t found in a supplement or self-help book — but in your fridge? That’s the surprising link uncov...

From a Girlfriend’s Moisturiser to a Men’s Skincare Movement: How Two Mates Built Two Dudes

In a men’s skincare market that often feels like a choice between hyper-masculinity and poorly disguised women’s products, Two Dudes stands out. It’s not trying to be macho. It’s n...

The Great Fleecing: Time for Aussies to demand more from their banks

By Anhar Khanbhai, Chief Anti-Fleecing Officer, Wise   As Australians escape the winter chill for Europe’s summer or Southeast Asia’s sun, many don’t realise they’re walking strai...

Agentforce for Financial Services: Merging AI and Human Expertise for Tailored BFSI Solutions

In this rapidly evolving world of financial services, deploying customer experiences that are personalized and intelligent is crucial. Agentforce for Financial Services by Sale...

Cult Favourite, TokyoTaco, Opens Beachfront at Mooloolaba this June

FREE Tokyo Tacos to Celebrate!  Cult favourite Japanese-Mexican restaurant TokyoTaco is opening a beachfront venue at the Mooloolaba Esplanade on Queensland’s Sunshine Coast t...

Samsara Eco and lululemon announce 10 year partnership

lululemon and Samsara Eco Announce 10-Year Plan to Advance Recycled Material Portfolio Plan will see lululemon source a significant portion of its future nylon 6,6 and polyes...