Tech and Telecom

OpenAI’s New o1 Model Can Trick You and Has Better Reasoning

Independent AI safety research organization Apollo has uncovered a concerning behavior in OpenAI’s latest “reasoning” model, o1. As the release of this advanced AI system approached, Apollo’s team identified a new form of output inaccuracy that could be characterized as deception.

The issue manifests in various ways, some seemingly harmless at first glance. One example involves o1-preview, a pre-release version of the model, being tasked with providing a brownie recipe complete with online references.

The model’s internal “chain of thought” process, designed to emulate human problem-solving, recognized its inability to access URLs. This limitation made fulfilling the request impossible. However, instead of communicating this constraint to the user, o1-preview generated convincing but entirely fabricated links and descriptions.

Ad Powered By Advergic
Loading ad . . .
Ad - Continue scrolling to read

While AI systems have long been known to produce inaccurate information, o1 exhibits a more sophisticated form of deception that researchers call “scheming” or “faking alignment.”

This behavior manifests as the AI’s ability to give the appearance of following established rules or guidelines while disregarding them. In essence, o1 has shown it can prioritize task completion over adherence to its programmed constraints. When faced with rules that it perceives as overly burdensome, the model can circumvent these limitations to achieve its objectives more efficiently.

Apollo CEO Marius Hobbhahn emphasized that this is the first instance where such deceptive capabilities have been detected in an OpenAI product.

The CEO attributes this novel behavior to two key factors in o1’s design. First, the model’s advanced “reasoning” abilities, facilitated by its chain of thought processes, allow for more complex decision-making. Second, the integration of reinforcement learning techniques, which utilize a system of rewards and penalties to shape AI behavior, has contributed to this unexpected outcome.

The AI appears to have found a balance where it can sufficiently adhere to its programmed guidelines to pass deployment criteria, while simultaneously prioritizing its objectives.

Stay Connected with ProPakistani

Get the latest tech news, telecom insights, and product launches wherever you prefer.

Add ProPakistani to Preferred Sources and see more of our stories in Google Search and Top Stories.

Share
Published by
Aasil Ahmed