Earlier this month, Daniel Kibblesmith received an emailed memo from HarperCollins, one of the world’s largest publishing companies, offering $2,500 to license his 2017 children’s book Santa’s Husband over a three-year period. The catch? The title would be licensed to a tech company to help train an A.I. model. “Abominable,” the author wrote of the offer in a post on the microblogging site Bluesky.
With their troves of high-quality content, book publishers have emerged as an enticing target for A.I. companies in need of data to enhance the capabilities and knowledge of their A.I. systems. HarperCollins, a British-American publishing company and member of the “Big Five” publishing group, recently inked a partnership with Microsoft that will see some of its nonfiction books used to help the company train a new model, as reported by Bloomberg. In a statement to Observer, HarperCollins confirmed that it has “reached an agreement with an artificial intelligence technology company to allow limited use of select nonfiction backlist titles for training A.I. models.” Microsoft (MSFT) declined requests for comment.
HarperCollins noted that authors will be given the option to take or pass on the opportunity. “Part of our role is to present authors with opportunities for their consideration while simultaneously protecting the underlying value of their works and our shared revenue and royalty streams,” the publisher said. “This agreement, with its limited scope and clear guardrails around model output that respects author’s rights, does that.”
The deal’s guardrails include limiting the output of A.I. models to no more than 5 percent of a book’s text, according to a statement from the Authors Guild, the largest professional organization of writers in the U.S. HarperCollins’ A.I. licensing partnership will result in a $5,000 fee per title split evenly between the publisher and the author, said the organization. Although the Authors Guild described this arrangement as giving “far too much to the publisher,” it lauded the fact that HarperCollins will request individual permission from writers and described licensing as a way to “bring control over uses back to the authors and their partners.”
Alongside writers like George R.R. Martin, Jonathan Franzen and Jodi Picoult, the Authors Guild last year sued OpenAI for allegedly using their work to train models without permission. Various authors have also filed similar copyright lawsuits against the likes of Anthropic, Meta (META) and Microsoft for training A.I. models on datasets of pirated books.
These concerns haven’t stopped publishers from striking lucrative deals with major tech companies. Academic publishers Wiley and Taylor & Francis earlier this year partnered with various A.I. developers to provide content for A.I. training, with Microsoft reportedly offering $10 million to the latter for access to its data. Oxford University Press has also said that it’s working with A.I. companies, while MIT Press recently told 404 Media it has been approached with several A.I. training offers.
As they run out of accessible high-quality data online, A.I. developers are increasingly seeking out new ways to get their hands on reliable and accurate content. News Corp, the parent company of HarperCollins, in May struck an agreement to provide stories from its news publications like the Wall Street Journal, Barron’s and the New York Post to OpenAI, which has similar deals with a bevy of publications including the Atlantic, Vox Media, the Associated Press, the Financial Times and Time Magazine. Microsoft, too, has content licensing arrangements with the likes of Reuters, Hearst Magazines and Axel Springer.
Microsoft’s data access could soon expand significantly, as HarperCollins has already sent out requests to license books from thousands of writers, according to the Authors Guild. How many authors will actually opt in, however, is yet to be seen. In replies to his Bluesky post, Kibblesmith jokingly said he probably wouldn’t take such a deal unless it was worth $1 billion. “I’d do it for an amount of money that wouldn’t require me to work anymore, since that’s the end goal of this technology,” he wrote.