After Reddit results started disappearing from search engines not named Google last week, the company has finally come forward to explain why, essentially downplaying the search issue and saying that it’s sick and tired of AI companies training on its data for free.
“We’ve had Microsoft, Anthropic, and Perplexity act as though all of the content on the internet is free for them to use,” Reddit CEO Steve Huffman told The Verge in an interview. “...which has put us in a position now of blocking folks who haven’t been willing to come to terms with how we’d like our data to be used or not used.”
Huffman accused Microsoft of training its AI on Reddit data scraped through Bing, as well as turning around and selling that data through the Bing API. Reddit searches being pulled, it seems, was largely just a byproduct of blocking that process, although the company also wasn’t happy about Bing’s search engine using AI to summarize its posts without requiring users to click through to them.
But how could fighting AI break search? While they might seem like totally separate technologies, both AI and search depend on “web crawlers,” which scroll through the internet collecting data that can be stored, displayed, or used elsewhere. This kind of tech is necessary for search engines to work the way they do, but it can also be used for AI training. When websites update their files to block web crawlers, it breaks both.
Given that Huffman spent most of his time talking about AI, it seems the crux of the issue is that Reddit doesn’t want companies training on its users’ data without having any kind of say, with Huffman telling The Verge that companies like Microsoft, Anthropic, and Perplexity have refused to negotiate.
“Without these agreements, we don’t have any say or knowledge of how our data is displayed or what it’s used for.” The CEO said it has been “a real pain in the ass to block these companies.”
That doesn’t mean Reddit is being fully altruistic, mind you. Earlier this year, the company signed a $60 million-a-year licensing deal that allows Google to train its AI on user posts, which would also explain why Reddit posts still show up unhindered in Google search. Similarly, OpenAI can also train on Reddit posts, and its upcoming SearchGPT engine will be able to link to them, although the specific dollar amount behind Reddit’s agreement with the ChatGPT maker hasn’t been disclosed.
Rather than being against AI, Reddit wants to be involved in the decision making process for what happens with its data. Oh, and get paid, too.
The Verge said Huffman referred to a recent comment from Microsoft AI CEO Mustafa Suleyman as an example of the type of behavior it’s looking to combat. In a discussion with CNBC’s Andrew Ross Sorkin at the Aspen Ideas Festival, the executive said that content “that’s already on the open web…has been ‘freeware,’ if you like.”
That’s certainly a creative interpretation of copyright law, but it’s also not unique to Microsoft. Despite Google’s deal with Reddit, in July of last year, Gizmodo spotted a change to the Google privacy policy that said it uses “publicly available information” to train its AI models, without acknowledging that Google does not in fact own everything posted to the internet.
While it’s unclear just how Google defines “publicly available,” this new Reddit deal might shine a light on the subject. For now, AI training could be moving from a free-for-all to a point where those who can afford to make companies pay will get their deserved bag (assuming that profiting off selling content users made before AI was even a thing counts as deserved in your eyes). Alongside Reddit, The Verge's parent company Vox Media has also entered into a deal with OpenAI, as has The Atlantic. As for the rest of us, we’ll have to rely on regulation, which has been slow to respond to AI outside of the EU.