Reddit, Google, and the Real Cost of the AI Data Rush

28.07.2024 12:00

Thecut.com

The open web is getting walled off.

Photo-Illustration: Intelligencer; Photo: Getty Images

As of this past week, the only major search engine that still includes Reddit is Google. You won’t find recent Reddit links in Microsoft’s Bing, or get useful results from the platform on privacy-centric search engine DuckDuckGo. For most people, this change won’t matter much in the short term — most people, after all, use Google, which hovers at somewhere around 90 percent of global search market share, and they can still visit Reddit directly — but it’s a weird one. Reddit is a platform with deep connections to the web, beginning its life as a link aggregator and growing into an online community with more than 80 million daily active users (the more the better). Why, according to a report in 404 Media, is it suddenly battening its hatches?

As is often the case when tech companies are behaving strangely or unpredictably these days, the answer has something to do with AI. Earlier this year, Google entered into a deal with Reddit, reportedly valued at $60 million per year, to license the site’s data for training AI models. In recent months, Google watchers have also noticed an increase in Reddit posts showing up in search results, with user comments ranking highly in a wide range of queries. While these stories are related, they’re also somewhat independent: Google, like others in AI, wants to license data so it can build better models and avoid getting sued; Google users were already adding “Reddit” to search queries as a sort of search quality hack, so the company was in some sense following their lead.

It’s where the two stories overlap that things are starting to break down. Search engines gather relevant and up-to-date information by crawling the web with bots, indexing what they find, and ranking it for relevance to users’ searches. Websites have some control over whether and how this crawling takes place, and there are various reasons they might refuse crawling of part of all of their sites (a private individual might want to keep an old blog online but unsearchable, while a company like Facebook might want Google users to be able to find profiles but not search their contents). For decades, though, crawling was part of a straightforward and mutually beneficial arrangement. Search engines gathered lots of users by offering a useful service; websites allowed and even catered to search engine crawling in order to connect with those audiences.

In the last few years, though, crawling has assumed an additional purpose. Those robots that are indexing your site and reading all your data aren’t just building a search index. They might be building an AI model, too. This, for many websites, is very much not part of the deal. As David Pierce writes at The Verge, the sudden pivot from search index to AI training means that “the basic social contract of the web is falling apart.” A mutually beneficial arrangement is being replaced by an extractive one, driven by frantic and unilateral actions of startups and tech giants alike.

At first, the consequences of this breakdown were relatively contained, with large websites and platforms — owned by big companies like Facebook and Amazon — explicitly blocking crawlers from firms like OpenAI. This clarity didn’t last long. Google is all-in on AI. Bing is owned by Microsoft, which is OpenAI’s largest investor and partner. Suddenly, all the search companies were also AI companies, and there were new crawlers in the mix. All crawling was AI crawling — to assume otherwise would be naive. The sudden harvest was apparent to anyone paying attention to their traffic stats. The bots were scraping everything they could.

In response to 404 Media’s reporting, critics have made the case that Google — a company that otherwise seemed spooked by past and potential antitrust enforcement — is nonetheless buying an unfair advantage for a product that’s already nearly a monopoly, and they have a point: Without Reddit, one of the largest repositories of authentic human text on the internet, smaller search engines can’t compete.

But the story isn’t complete without Reddit, which is the company actually doing the blocking. (Microsoft, for its part, has confirmed its crawlers have been prohibited.) As a Reddit spokesperson told The Verge:

This is not at all related to our recent partnership with Google. We have been in discussions with multiple search engines. We have been unable to reach agreements with all of them, since some are unable or unwilling to make enforceable promises regarding their use of Reddit content, including their use for AI.

This is practical behavior by the leadership of Reddit, a public company with a duty to its shareholders, but also plainly bad for the general public: in addition to reducing access for users who don’t want to use Google, it further subordinates Reddit to Google’s specific search incentives, which are changing fast anyway, in part because of AI; already, spammers are polluting popular threads on Reddit, which is suddenly getting enormous amounts of traffic from Google, in hopes of getting more visibility in search results.

Google, like Reddit, owes its existence and success to the principles and practices of the open web, but exclusive arrangements like these mark the end of that long and incredibly fruitful era. They’re also a sign of things to come. The web was already in rough shape, reduced over the last 15 years by the rise of walled-off platforms, battered by advertising consolidation, and polluted by a glut of content from the AI products that used it for training. The rise of AI scraping threatens to finish the job, collapsing a flawed but enormously successful, decades-long experiment in open networking and human communication to a set of antagonistic contracts between warring tech firms.

More screen time

Новости от наших партнёров в Вашем городе

Ria.city

123ru.net

Путин подписал указ о федеральной выплате 400 тысяч рублей контрактникам. ДОКУМЕНТ

Мост через Москву-реку возле улицы Мясищева напрямую свяжет два района — Собянин

В Подмосковье строят 39 новых тротуаров вдоль региональных дорог

"Ласточка-премиум" будет ходить между Нижним Новгородом и Москвой с 1 августа

Музыкальные новости

Bigpot.news

Актер Тимур Шивырев умер после падения в жару

Телеканал ТНТ забрендировал эко-пляж «Маяк» в центре Сочи

ENERGY – Санкт-Петербург дарит билеты на «Пикник Афиши 2024»

Футболисты воронежского «Факела» крупно проиграли «Зениту» в Санкт-Петербурге

Новости России

29ru.net

Мария Багреева: заказчики Москвы сэкономили 3,5 млрд руб. на мини-аукционах

Приморье против мигрантов создало отряд "Тигр" - из бойцов СВО. Подмосковье зашевелилось

Акты "бьюти-вандализма". Что сотворили художники с омскими улицами?

Мост через Москву-реку возле улицы Мясищева напрямую свяжет два района — Собянин

Экология в России и мире

Life24.pro

Осторожно, слепни! Доктор Кутушов предупредил об опасных насекомых

Уроки анимации в Екатеринбурге

Сеть клиник «Будь Здоров» займется разработкой инициатив по укреплению здоровья работающего населения

Вторая жизнь пальм и вещей в Angsana Velavaru

Спорт в России и мире

News.tennis

Даниил Медведев сыграет в 1/8 финала Олимпиады-2024 в Париже

Мирра Андреева поднялась на девять строчек в рейтинге WTA

Россиянка Мирра Андреева завоевала первый титул WTA в карьере

Финалистка Уимблдона Паолини не смогла выйти в четвертьфинал Олимпиады-2024

Moscow.media

News24.pro

Столичная таможня обнаружила в багаже у пассажира бивни краснокнижного моржа

Треш-стримы запретят в России

Baza: отдыхающий в Москве на пляже мужчина показал детям свои гениталии

Горожанам рассказали, как развивается электронное голосование в Москве

Читайте на 123ru.net

Вопросы - ответы

Здоровье

Разное на 123ru.net

Видео-новости

Частные объявления в Вашем городе, в Вашем регионе и в России

Новости от наших партнёров в Вашем городе

Путин подписал указ о федеральной выплате 400 тысяч рублей контрактникам. ДОКУМЕНТ

Мост через Москву-реку возле улицы Мясищева напрямую свяжет два района — Собянин

В Подмосковье строят 39 новых тротуаров вдоль региональных дорог

"Ласточка-премиум" будет ходить между Нижним Новгородом и Москвой с 1 августа

Актер Тимур Шивырев умер после падения в жару

Телеканал ТНТ забрендировал эко-пляж «Маяк» в центре Сочи

ENERGY – Санкт-Петербург дарит билеты на «Пикник Афиши 2024»

Футболисты воронежского «Факела» крупно проиграли «Зениту» в Санкт-Петербурге

Мария Багреева: заказчики Москвы сэкономили 3,5 млрд руб. на мини-аукционах

Приморье против мигрантов создало отряд "Тигр" - из бойцов СВО. Подмосковье зашевелилось

Акты "бьюти-вандализма". Что сотворили художники с омскими улицами?

Мост через Москву-реку возле улицы Мясищева напрямую свяжет два района — Собянин

Осторожно, слепни! Доктор Кутушов предупредил об опасных насекомых

Уроки анимации в Екатеринбурге

Сеть клиник «Будь Здоров» займется разработкой инициатив по укреплению здоровья работающего населения

Вторая жизнь пальм и вещей в Angsana Velavaru

Даниил Медведев сыграет в 1/8 финала Олимпиады-2024 в Париже

Мирра Андреева поднялась на девять строчек в рейтинге WTA

Россиянка Мирра Андреева завоевала первый титул WTA в карьере

Финалистка Уимблдона Паолини не смогла выйти в четвертьфинал Олимпиады-2024

БФТ-Холдинг и компания Merlion подписали соглашение о сотрудничестве

«Байкал Сервис» отправил почти 2,5 млн документов по ЭДО

Утро в Кимже...

С момента открытия по трассе М-12 Восток проехали 15 млн раз

Топ новостей на этот час

Мария Багреева: Москва учредила номинацию в конкурсе научных работ по экономике

В Подмосковье строят 39 новых тротуаров вдоль региональных дорог

Мария Багреева: заказчики Москвы сэкономили 3,5 млрд руб. на мини-аукционах

Приморье против мигрантов создало отряд "Тигр" - из бойцов СВО. Подмосковье зашевелилось