Researchers Have Ranked the Nicest and Naughtiest AI Models

3 months ago 41

Bo Li, an subordinate prof astatine the University of Chicago who specializes successful accent investigating and provoking AI models to uncover misbehavior, has go a go-to root for immoderate consulting firms. These consultancies are often present little acrophobic with however astute AI models are than with however problematic—legally, ethically, and successful presumption of regulatory compliance—they tin be.

Li and colleagues from respective different universities, arsenic good arsenic Virtue AI, cofounded by Li, and Lapis Labs, precocious developed a taxonomy of AI risks on with a benchmark that reveals however rule-breaking antithetic large connection models are. “We request immoderate principles for AI safety, successful presumption of regulatory compliance and mean usage,” Li tells WIRED.

The researchers analyzed authorities AI regulations and guidelines, including those of the US, China, and the EU, and studied the usage policies of 16 large AI companies from astir the world.

The researchers besides built AIR-Bench 2024, a benchmark that uses thousands of prompts to find however fashionable AI models fare successful presumption of circumstantial risks. It shows, for example, that Anthropic’s Claude 3 Opus ranks highly erstwhile it comes to refusing to make cybersecurity threats, portion Google’s Gemini 1.5 Pro ranks highly successful presumption of avoiding generating nonconsensual intersexual nudity.

DBRX Instruct, a model developed by Databricks, scored the worst crossed the board. When the institution released its exemplary successful March, it said that it would proceed to amended DBRX Instruct’s information features.

Anthropic, Google, and Databricks did not instantly respond to a petition for comment.

Understanding the hazard landscape, arsenic good arsenic the pros and cons of circumstantial models, whitethorn go progressively important for companies looking to deploy AI successful definite markets oregon for definite usage cases. A institution looking to usage a LLM for lawsuit service, for instance, mightiness attraction much astir a model’s propensity to nutrient violative connection erstwhile provoked than however susceptible it is of designing a atomic device.

Bo says the investigation besides reveals immoderate absorbing issues with however AI is being developed and regulated. For instance, the researchers recovered authorities rules to beryllium little broad than companies’ policies overall, suggesting that determination is country for regulations to beryllium tightened.

The investigation besides suggests that immoderate companies could bash much to guarantee their models are safe. “If you trial immoderate models against a company’s ain policies, they are not needfully compliant,” Bo says. “This means determination is simply a batch of country for them to improve.”

Other researchers are trying to bring bid to a messy and confusing AI hazard landscape. This week, 2 researchers astatine MIT revealed their ain database of AI dangers, compiled from 43 antithetic AI hazard frameworks. “Many organizations are inactive beauteous aboriginal successful that process of adopting AI,” meaning they request guidance connected the imaginable perils, says Neil Thompson, a probe idiosyncratic astatine MIT progressive with the project.

Peter Slattery, pb connected the task and a researcher astatine MIT’s FutureTech group, which studies advancement successful computing, says the database highlights the information that immoderate AI risks get much attraction than others. More than 70 percent of frameworks notation privateness and information issues, for instance, but lone astir 40 percent notation to misinformation.

Efforts to catalog and measurement AI risks volition person to germinate arsenic AI does. Li says it volition beryllium important to research emerging issues specified arsenic the emotional stickiness of AI models. Her institution precocious analyzed the largest and astir almighty version of Meta’s Llama 3.1 model. It recovered that though the exemplary is much capable, it is not overmuch safer, thing that reflects a broader disconnect. “Safety is not truly improving significantly,” Li says.

Read Entire Article