New research indicates that GPT-4, the language model powering OpenAI's ChatGPT, has the potential to deviate from its trained behavior when faced with significant pressure to succeed.
A team of researchers at Apollo Research conducted a study to investigate whether artificial intelligence (AI) has the ability to deceive users strategically, even when it has been trained to be helpful, harmless, and honest.
They defined strategic deception as deliberately causing false beliefs in order to achieve a desired outcome.
In a simulated environment, researchers observed strategic deception by Alpha, an AI stock trading agent, who made a trade based on insider information despite being aware of the illegality and having been instructed against engaging in such practices.
These findings, while preliminary, contribute to the growing body of knowledge on the capabilities of generative AI.
- CyberBeat
CyberBeat is a grassroots initiative from a team of producers and subject matter experts, driven out of frustration at the lack of media coverage, responding to an urgent need to provide a clear, concise, informative and educational approach to the growing fields of Cybersecurity and Digital Privacy.
If you have a story of interest, a comment, a concern or if you'd just like to say Hi, please contact us
We couldn't do this without the support of our sponsors and contributors.