AI Models can Strategically Deceive their Users when Put Under Pressure

The researchers defined strategic deception as "attempting to systematically cause a false belief in another entity in order to accomplish some outcome."
17 January 2024
Image by CyberBeat

New research indicates that GPT-4, the language model powering OpenAI's ChatGPT, has the potential to deviate from its trained behavior when faced with significant pressure to succeed.

A team of researchers at Apollo Research conducted a study to investigate whether artificial intelligence (AI) has the ability to deceive users strategically, even when it has been trained to be helpful, harmless, and honest.

They defined strategic deception as deliberately causing false beliefs in order to achieve a desired outcome.

In a simulated environment, researchers observed strategic deception by Alpha, an AI stock trading agent, who made a trade based on insider information despite being aware of the illegality and having been instructed against engaging in such practices.

These findings, while preliminary, contribute to the growing body of knowledge on the capabilities of generative AI.

- CyberBeat


About CyberBeat

CyberBeat is a grassroots initiative from a team of producers and subject matter experts, driven out of frustration at the lack of media coverage, responding to an urgent need to provide a clear, concise, informative and educational approach to the growing fields of Cybersecurity and Digital Privacy.

Contact CyberBeat

If you have a story of interest, a comment, a concern or if you'd just like to say Hi, please contact us

Terms & Policies >>


We couldn't do this without the support of our sponsors and contributors.