Think of immoderate taxable vaguely related to raising kids imaginable, and there’s astir apt a station astir it connected Mumsnet, the long-running, enormously popular, controversy-spurring UK-based parenting forum for mothers. Over its much than 2 decade-long history, Mumsnet has amassed an archive of much than six cardinal words written by its highly engaged idiosyncratic base, connected topics specified arsenic soiled diapers and lazy husbands. (Not to notation a bonkers rant astir dolphins.)
This spring, aft Mumsnet discovered that AI companies were scraping its data, the institution says it decided to effort to onslaught licensing deals with immoderate of the large players successful the space, including OpenAI, which initially expressed willingness to research an statement aft Mumsnet archetypal reached out. After talks with OpenAI fell apart, Mumsnet successful July announced its volition to pursue ineligible action.
According to Mumsnet, during those aboriginal conversations, an OpenAI strategical concern pb told the institution that datasets implicit 1 cardinal words were of involvement to the AI giant. Mumsnet’s enactment was excited. “We spent rather immoderate clip successful a back-and-forth with them,” Mumsnet laminitis and CEO Justine Roberts tells WIRED. “We had to motion immoderate NDAs, and they wanted a batch of accusation from us.”
However, implicit a period later, OpenAI told Mumsnet that the institution was nary longer funny successful partnering astatine that time, according to an email speech reviewed by WIRED. When asked why, the OpenAI staffer characterized Mumsnet’s 6 cardinal connection dataset arsenic excessively tiny to warrant a licensing arrangement, Roberts says. They besides noted that OpenAI is chiefly funny successful ample datasets that the nationalist cannot already entree online, and that it wanted datasets that captured wide quality experience.
This sentiment was echoed by the institution erstwhile asked for remark from WIRED. “We prosecute partnerships for large-scale datasets that bespeak quality nine and bash not prosecute partnerships solely for publically disposable information,” says OpenAI spokesperson Kayla Wood. “We enactment steadfast and creator choice, offering them ways to explicit their preferences astir however their sites and contented enactment with AI successful hunt results and grooming generative AI instauration models.”
Roberts says she was “irritated” by this development. She recalls that OpenAI astatine archetypal had seemed particularly funny successful Mumsnet due to the fact that of the platform’s heavy female-written content. “It’s precise high-quality conversational data,” she says. “It’s 90 percent pistillate conversation, which is rather unusual.”
OpenAI has struck a assortment of data-licensing deals with media outlets and platforms successful the past year, entering into agreements with Vox Media, the Atlantic, Axel Springer, Time, and WIRED genitor institution Condé Nast, arsenic good arsenic platforms filled with user-generated contented similar Reddit. (Automattic, the proprietor of WordPress.com and Tumblr, was besides said to beryllium successful licensing talks earlier this year.) As the particulars of those deals haven’t been revealed, it’s not wide what the size of their respective corpuses are.
When WIRED asked astir the size of datasets it volition see for commercialized licensing, OpenAI declined to stock that information. But spokesperson Kayla Wood emphasizes that the company’s partnerships with publishers are “focused connected displaying their contented successful our products and driving postulation to them.”