Microsoft signed a three-year deal with Harper Collins to train an as-yet-unnamed AI model on the major publisher’s catalog. According to Bloomberg, the terms of the deal offered $5,000 per nonfiction book, split evenly between the author and HarperCollins. The deal is separate from other publishing agreements and is not counted against existing advances. In addition, the deal only applies to select nonfiction books that were previously published, not fiction books.
404 Media broke the news but did not reveal the name of the tech company involved. Bloomberg published a follow-up article with more details, including the fact that Microsoft is developing the AI model.
HarperCollins authors must opt into the AI training program and allow their nonfiction books to be used. Authors who decline the offer will not have their books included in the training dataset and will not receive the payout. Not all HarperCollins authors will be offered the deal. Microsoft is selecting the books it wants to include in the training set.
The deal allegedly includes terms meant to mitigate authors’ concerns about generative AI and how it might plagiarize content or reduce the demand for human writers. For instance, the deal states that “no more than 200 consecutive words and/or five percent of a book’s text” will be used in training the AI model. It also includes a pledge that Microsoft will not scrape text from illegal piracy websites.
Large learning models (LLMs) and other AI model types require vast datasets to train. Only a finite amount of content is available in the public domain. By purchasing access to HarperCollins’ nonfiction backlist, Microsoft is significantly increasing the pool of available data it can use to train its AI model.
While various tech companies have previously struck deals with publishers to train artificial intelligence models on past content, this is the first time that the specific terms of the deal have been made public. The HarperCollins deal gives a monetary benchmark of what Microsoft—and by extension, other AI companies—are willing to spend to train their models.
A source also told Bloomberg that Microsoft that the AI model will not be used to generate books. The purpose of the new Microsoft AI model has not yet been announced.
The post HarperCollins Book Catalog To Train Microsoft AI Models For The Next 3 Years appeared first on eWEEK.