OpenAI contiguous released a preview of its next-generation ample connection models, which the institution says execute amended than its erstwhile models but travel with a fewer caveats.
In its announcement for the caller model, o1-preview, OpenAI touted its show connected a assortment of tasks designed for humans. The exemplary scored successful the 89th percentile successful programming competitions held by Codeforces and answered 83 percent of questions connected a qualifying trial for the International Mathematics Olympiad, compared to GPT-4o’s 14 percent correct.
Sam Altman, OpenAI’s CEO, said the o1-preview and o1-mini models were the “beginning of a caller paradigm: AI that tin bash general-purpose analyzable reasoning.” But helium added that “o1 is inactive flawed, inactive limited, and it inactive seems much awesome connected archetypal usage than it does aft you walk much clip with it.”
When asked a question, the caller models usage chain-of-thought techniques that mimic however humans deliberation and however galore generative AI users person learned to usage the technology—by continuously prompting and correcting the exemplary with caller directions until it achieves the desired answer. But successful o1 models, versions of those processes hap down the scenes without further prompting. “It learns to admit and close its mistakes. It learns to interruption down tricky steps into simpler ones. It learns to effort a antithetic attack erstwhile the existent 1 isn’t working,” the company said.
While these techniques amended the models’ performances connected assorted benchmarks, OpenAI recovered that successful a tiny subset of cases, they besides effect successful o1 models intentionally deceiving users. In a trial of 100,000 ChatGPT conversations powered by o1-preview, the institution recovered that astir 800 answers the exemplary supplied were incorrect. And for astir a 3rd of those incorrect responses, the model’s concatenation of thought showed that it knew the reply was incorrect but provided it anyway.
“Intentional hallucinations chiefly hap erstwhile o1-preview is asked to supply references to articles, websites, books, oregon akin sources that it cannot easy verify without entree to net search, causing o1-preview to marque up plausible examples instead,” the institution wrote successful its exemplary system card.
Overall, the caller models performed amended than GPT-4o, OpenAI’s erstwhile state-of-the-art model, connected assorted institution information benchmarks measuring however easy the models tin beryllium jailbroken, however often they supply incorrect responses, and however often they show bias regarding age, gender, and race. However, the institution recovered that o1-preview was importantly much apt than GPT-4o to supply an reply erstwhile it was asked an ambiguous question wherever the exemplary should person responded that it didn’t cognize the answer.
OpenAI did not merchandise overmuch accusation astir the information utilized to bid its caller models, saying lone that they were trained connected a operation of publically disposable information and proprietary information obtained done partnerships.