Добавить новость

"Камерата РНО" исполнит Моцарта в Московской филармонии

«Ты бы уже забеременела»: Натан обратился к Лерчек после романа на шоу

Суд Москвы оштрафовал Google за запрещённый контент на 7,6 млн руб

В Самаре и Тольятти проходит фестиваль «Музыкальное сердце театра»-2024

News in English


Новости сегодня

Новости от TheMoneytizer

'Let chaos reign': AI inference costs are about to plummet

SambaNova Systeman CEO Rodrigo Liang, Cerebras CEO Andrew Feldman, Foundry CEO Jared Quincy Davis, Groq CEO Jonathan Ross
  • All kinds of startups are rushing into the AI inference market.
  • Inference market competition may lower the price of AI, benefiting builders but challenging clouds.
  • Not all startups will survive the period of "chaos" to come.

Jared Quincy Davis and his AI computing startup Foundry sell inference. They don't make chips or build large language models. Foundry has a unique method of making cloud computing more efficient. Instead of selling its technology to cloud providers, the Foundry team decided to become one and use its tech to operate a more efficient cloud.

Once companies looking to leverage and sell an AI product have trained their models and know that they perform, they're looking for ease, speed, and value whenever generating outputs. Inference-as-a-service providers like Foundry, aim to simplify the process of generating those outputs.

Foundry offers training and fine-tuning, too, as many cloud providers do, but these days, it seems like anyone with an AI compute-boosting technology is attempting to monetize by selling inference — or more specifically, tokens, the base unit of data in AI.

Cerebras sells inference too. The company's core expertise is designing chips for training and inference, but it recently started selling the latter as a service. So does Groq, a chip company formed by two former Googlers, who recognized early that inference was going to get the bigger share of computing. SambaNova Systems, another hardware platform, also sells inference as a service.

Companies like Lambda, CoreWeave, Together AI, and Crusoe, all close partners of Nvidia, run data centers suited specifically to AI workloads and offer inference services. And then there are the hyperscalers like AWS and Microsoft Azure.

With so many companies specializing in inference, suspicion is rising that the cost of inference is about to drop off a cliff.

"Part of the reason inference is a little commoditizable is customers are kind of paying for tokens at the end of the day," Davis told Business Insider.

The current market for inference is kind of like the electricity market, Davis said. There are a ton of niche sources you can access if you actually shop around, but not everyone does. Most people just want to flip the light switch and have it work.

But there is a lot of nuance to sift through for those willing. For some customers, speed is of the utmost importance. Speed has distinctions too, like time to the first token and tokens per second. There's total job completion time and there are different kinds of inference workloads that lend themselves to different computing setups.

Energy efficiency of the underlying hardware and networking is a big determinant of cost. And cost in inference computing is even more important than in training, Groq cofounder Jonathan Ross recently told BI. Training is an overhead cost, while inference is an operating cost.

Zoom out from all of the intricacies, and inference is becoming the commodity of the AI age.

"Some companies just want output and they don't care about infrastructure," Mitesh Agrawal, head of cloud for Lambda, told BI.

Commoditizing AI

Lambda is in the early stages of an inference-as-a-service offering, but Agrawal said the company is going about it carefully, focusing on providing holistic computing services, and not just tokens.

Inference profit margins can vary widely, Agrawal said. With general compute — where the customer rents fixed capacity — the margins are easier to manage. When you're charging for usage or input and output of a model, the return is less predictable.

Organizing multiple users across a finite number of servers takes finesse. Whether or not the cost of operating the hardware is actually covered with room for profit comes down to how well that organization is done, Agrawal explained.

So why would neoclouds offer the riskier service?

Agrawal said it's about getting potential customers in the door. Inference-as-a-service customers can turn into more traditional compute customers, and as the slate of competitors grows, relationships, and history grow in importance.

Lambda's financial models assume that price cuts are coming soon as more players enter the inference space and chips become more efficient.

A race to the bottom?

How fast the demand for inference is growing is up for debate, but in recent public statements, Nvidia CEO Jensen Huang has said on multiple occasions that new models, like OpenAI's o1, require more compute to generate the same number of responses because they run multiple models to check their own work or "reason." Accuracy, it turns out, requires more compute.

Inference loads are poised to grow, but service providers still anticipate a drop in price from the influx of new players. Davis isn't worried though.

He recalled Jevon's paradox — an economic principle in which a drop in price or an increase in efficiency leads to more total consumption — like when you widen a highway and traffic gets worse.

"If I make something 10 times cheaper, people won't spend 10 times less, nor will they even hold their budgets the same. They'll spend more," Davis said. "That makes sense because what are you doing when you make something 10 times cheaper, you're making the ROI better."

In other words, "it turns out, when you make inference cheaper, people decide to do a lot more inference," Davis said.

The ride ahead could be "bumpy" though, and not all players are likely to survive the moments of mismatch between supply and demand.

"As my old boss at Intel Andy Grove used to say, 'Let chaos reign, and then reign in the chaos'," said Sriram Viswanathan, founding managing partner at Celesta Capital and investor in SambaNova Systems.

He agrees the next few years will be wildly competitive for inference providers, but he believes the winners will be decided on merit.

"The core innovation can't be in the go-to-market, but in the performance and power of the underlying architecture," Viswanathan said.

Many of the companies selling tokens to break into the AI market aspire to more. The chip designers eventually want to sell chips to hyperscalers rather than inference to AI startups. The ultimate version of Foundry's tech is bigger too.

"If we do our job, right, you know, we will be a core part of how every GPU runs," Davis said. All roads, it seems, run through inference.

Hugh Langley contributed reporting.

Got a tip or an insight to share? Contact Senior Reporter Emma Cosgrove at ecosgrove@businessinsider.com or use the secure messaging app Signal: 443-333-9088

Read the original article on Business Insider

Читайте на 123ru.net


Новости 24/7 DirectAdvert - доход для вашего сайта



Частные объявления в Вашем городе, в Вашем регионе и в России



Smi24.net — ежеминутные новости с ежедневным архивом. Только у нас — все главные новости дня без политической цензуры. "123 Новости" — абсолютно все точки зрения, трезвая аналитика, цивилизованные споры и обсуждения без взаимных обвинений и оскорблений. Помните, что не у всех точка зрения совпадает с Вашей. Уважайте мнение других, даже если Вы отстаиваете свой взгляд и свою позицию. Smi24.net — облегчённая версия старейшего обозревателя новостей 123ru.net. Мы не навязываем Вам своё видение, мы даём Вам срез событий дня без цензуры и без купюр. Новости, какие они есть —онлайн с поминутным архивом по всем городам и регионам России, Украины, Белоруссии и Абхазии. Smi24.net — живые новости в живом эфире! Быстрый поиск от Smi24.net — это не только возможность первым узнать, но и преимущество сообщить срочные новости мгновенно на любом языке мира и быть услышанным тут же. В любую минуту Вы можете добавить свою новость - здесь.




Новости от наших партнёров в Вашем городе

Ria.city

Этот город остановил Тамерлана — 7 чудес Ельца и окрестностей

Брокер Ракута: ипотеку по полной ставке берут 2–3% россиян

В Павловском Посаде сотрудники Госавтоинспекции проводят с учениками безопасные пятиминутки

Команда медработников из Самарской области прошла в финал олимпиады по оценке умений и навыков оказания экстренной медицинской помощи

Музыкальные новости

«Кубок Спарты», турнир для ветеранов СВО и подготовка к зиме: о чем писали главы городов и районов Коми на этой неделе

Питчинг Релиза. Отправить релиз на Питчинг.

Футбол останется базовым спортом в Тамбовской области еще на год

«Гордимся, когда видим Лешу в форме сборной России». В «Локомотиве» поздравили Батракова с дебютом

Новости России

В Подольске мужчина избил жену в автобусе и попал на видео

Раздолье для активных. Каждый год округ прирастает десятками новых объектов для занятий спортом

Карбас без гвоздей, найденные компасы и фотографии. Зачем идти на выставку об экспедиции по северным морям

В школах ДНР детям начнут преподавать уроки самбо

Экология в России и мире

4 ключевых шага к лечению кандиды

Показ мод в Саудовской Аравии. ХХI век

Избавляемся от двойного подбородка: экспресс-метод Мамада Йошико

Демоны, электрошок и принудительный покой. Как исследовали и лечили послеродовую депрессию с древних времен до XXI века

Спорт в России и мире

Синнер вышел в финал Итогового турнира, Оливейра победил Чендлера в UFC. Главное к утру

Синнер стал финалистом Итогового турнира ATP

Непреодолимая преграда: Медведев снова проиграл Синнеру и не смог выйти в полуфинал Итогового турнира ATP

Медведев опустился на пятое место в рейтинге ATP по итогам сезона

Moscow.media

Троих челябинцев отправили в тюрьму за попытку диверсии

18 ноября День рождения Деда Мороза

Терминал сбора данных (ТСД) промышленного класса SAOTRON RT42G

Всемирный день качества отметили в филиале «Московском» ООО «ЛокоТех-Сервис»











Топ новостей на этот час

Rss.plus






В лесу подмосковного Талдома неизвестный застрелил охотника

Онопко – о здоровье Дивеева: «Мы думаем и об интересах клубов»

В Самаре и Тольятти проходит фестиваль «Музыкальное сердце театра»-2024

Михаил Пореченков даст отпор китайской мафии 2 декабря