OpenAI Threatens Bans as Users Probe Its ‘Strawberry’ AI Models

3 months ago 54

OpenAI genuinely does not privation you to cognize what its latest AI exemplary is “thinking.” Since the institution launched its “Strawberry” AI exemplary family past week, touting alleged reasoning abilities with o1-preview and o1-mini, OpenAI has been sending retired informing emails and threats of bans to immoderate idiosyncratic who tries to probe however the exemplary works.

Unlike erstwhile AI models from OpenAI, specified arsenic GPT-4o, the institution trained o1 specifically to enactment done a step-by-step problem-solving process earlier generating an answer. When users inquire an "o1" exemplary a question successful ChatGPT, users person the enactment of seeing this chain-of-thought process written retired successful the ChatGPT interface. However, by design, OpenAI hides the earthy concatenation of thought from users, alternatively presenting a filtered mentation created by a 2nd AI model.

Nothing is much enticing to enthusiasts than accusation obscured, truthful the contention has been connected among hackers and red-teamers to effort to uncover o1's earthy concatenation of thought utilizing jailbreaking oregon prompt injection techniques that effort to instrumentality the exemplary into spilling its secrets. There person been aboriginal reports of immoderate successes, but thing has yet been powerfully confirmed.

Along the way, OpenAI is watching done the ChatGPT interface, and the institution is reportedly coming down hard connected immoderate attempts to probe o1's reasoning, adjacent among the simply curious.

One X idiosyncratic reported (confirmed by others, including Scale AI punctual technologist Riley Goodside) that they received a informing email if they utilized the word "reasoning trace" successful speech with o1. Others say the informing is triggered simply by asking ChatGPT astir the model's "reasoning" astatine all.

The informing email from OpenAI states that circumstantial idiosyncratic requests person been flagged for violating policies against circumventing safeguards oregon information measures. "Please halt this enactment and guarantee you are utilizing ChatGPT successful accordance with our Terms of Use and our Usage Policies," it reads. "Additional violations of this argumentation whitethorn effect successful nonaccomplishment of entree to GPT-4o with Reasoning," referring to an interior sanction for the o1 model.

Marco Figueroa, who manages Mozilla's GenAI bug bounty programs, was 1 of the archetypal to station astir the OpenAI informing email connected X past Friday, complaining that it hinders his quality to bash affirmative red-teaming information probe connected the model. "I was excessively mislaid focusing connected #AIRedTeaming to realized that I received this email from @OpenAI yesterday aft each my jailbreaks," helium wrote. "I'm present connected the get banned list!!!"

Hidden Chains of Thought

In a station titled “Learning to Reason With LLMs” connected OpenAI's blog, the institution says that hidden chains of thought successful AI models connection a unsocial monitoring opportunity, allowing them to "read the mind" of the exemplary and recognize its alleged thought process. Those processes are astir utile to the institution if they are near earthy and uncensored, but that mightiness not align with the company's champion commercialized interests for respective reasons.

"For example, successful the aboriginal we whitethorn privation to show the concatenation of thought for signs of manipulating the user," the institution writes. "However, for this to enactment the exemplary indispensable person state to explicit its thoughts successful unaltered form, truthful we cannot bid immoderate argumentation compliance oregon idiosyncratic preferences onto the concatenation of thought. We besides bash not privation to marque an unaligned concatenation of thought straight disposable to users."

Read Entire Article