Large learning models (LLMs) like OpenAI’s ChaptGPT and Google’s Gemini train their models on copyrighted content from the Internet under “fair use,” a controversial doctrine that allows them to pay nothing for much of their content.
Many best-selling authors, artists and publishers have lined up to sue various AI companies for infringing their copyright by training their bots on their work without express consent.
This is harmful not only to their livelihood it compromises originality. Good enough can easily become great.
Qualified LLMs
Fairly Trained, a new non-profit business, is qualifying LLMs whose content is legally obtained and licensed.
CEO Ed Newton-Rex, a UK-bred, Palo Alto-based composer, musician and AI pioneer, who famously quit his
executive position at Stability AI in November 2023 because of what he believe was unethical behavior, unfair to content industry providers. Stability AI is the company behind the widely used Stable Diffusion open source image generator.
“We believe there are many consumers and companies who would prefer to work with generative AI companies who train on data provided with the consent of its creators,” states the Fairly Trained website.
OpenAI has continued to defend its right to collect and train on public data it scrapes even without licensing deals in place.
“Netwton-Rex advises they change course and train new models on data that was obtained with creator permission,” reports VentureBeat, “ideally by licensing it from them, potentially for a fee.”
This is an approach OpenAI has adopted with some news outlets lately, including The Associated Press and Axel-Springer, publisher of Politico and Business Insider.
OpenAI is reportedly paying millions annually for the privilege of using their data. However, the AI company has continued to defend its right to collect and train on public data it scrapes without licensing deals in place.
Already licensed by Fairly Trained to include the L certification from Fairly Trained includes Beatoven.AI, Boomy, BRIA AI, Endel, LifeScore, Rightsify, Somms.ai,
Soundful, and Tuney.
Among Fairly Trained advisers are Tom Gruber, the former chief technologist of Siri (acquired by Apple), and Maria Pallante, President & CEO of the Association of American Publishers. (Full disclosure: Terry Hart, GC for AAP is the board of the Center for IP Understanding which I chair.)
Impressive List of Supporters
The nonprofit also lists among its supporters the Association of American Publishers, Association of Independent Music Publishers, Concord (a leading music and audio group), and Universal Music Group.
The latter two groups are suing AI company Anthropic over its Claude chatbot’s reproduction of copyrighted song lyrics.
“What I don’t want to do is claim that if someone is certified here, they are perfectly ethical,” Newton-Rex told Wired. “It’s not going to solve everything, but I think that it can help.”
He wants to roll out additional certificates in the future, possibly addressing issues like compensation.
Something about classifying AI bots as ethical on behalf of content industry creators feels right. I hope it is only a matter of time until inventors and other patent holders feel and act similarly.
Who knows? Calling out unethical, unlicensed copyright behavior on the part of AI bots that collect content may soon encourage more mindful tracking of patented inventions that are used without authorization.
Serial infringement that is difficult to challenge has startling similarities to AI scraping. As far as I know there is no “fair use” when it comes to patented inventions.
Image source: fairlytrained.org; thetimes.com
