Лесная охрана контролирует ситуацию в лесах в праздничные дни

"5 января начнется то, чего не было с 1829 года": Вильфанд поделился пугающим прогнозом

В Подольске порядка 60 юных хоккеистов с нарушениями развития борются за победу в «Турнире героев»

«Зима в Москве»: в ходе новогодней акции на ярмарках собрали более тонны фруктов

OpenAI Promises the Next Model of ChatGPT Will Be Better at Reasoning

30.12.2024 17:00

Lifehacker.com

OpenAI has unveiled a new model for its products, arriving for users near the end of January, 2025: It's called o3 (we seem to have jumped over o2), and it promises another significant step forward in AI reasoning. According to its developers, it will make tools like ChatGPT better than ever at programming and working out math problems.

OpenAI CEO Sam Altman described o3 as "incredibly smart" in the video announcing the model, released as part of his company's "12 Days of OpenAI" promotion over the holiday season. The model is undergoing a variety of safety tests before it launches in full—first likely only for paying ChatGPT Plus users.

The o3 model is more than 20 percent better than the previous o1 model at coding, per the SWE-bench Verified benchmark, OpenAI says. It also scores strongly on math and science problems, at least according to benchmark tests—like o1, the o3 model is trained to think and reason before it answers, rigorously testing its responses for accuracy. OpenAI will also release a smaller, faster o3-mini model alongside the main update.

The pattern of completing squares with a darker blue square is simple for humans, but hard for AI—and it's a challenge that's part of ARC. Credit: ARC

We won't know just how good o3 is until users can actually test it, but we already have an idea of what o3 can do because it's been tested against the well-known Abstraction and Reasoning Corpus (ARC) challenge, designed to track AI's progress towards Artificial General Intelligence (AGI)—the somewhat contentious point at which AI cognitive capabilities pass those of humans.

This challenge gets AI to come up with new approaches to problems, rather than just relying on its memory, and involves a series of visual tasks for models to complete. They must match patterns in colored grids, exercises intended to be easy for people to complete without any training, but hard for AI to figure out.

Within the computing power boundaries of the ARC test, o3 scored 75.7%. That's way above the 5% achieved by the GPT-4o model, currently the best ChatGPT model available to free users. While we're still some way short of AGI (the model is still below human scores, and couldn't complete all the tasks), that's an impressive step up.

o1 and o1-mini are currently available to ChatGPT Plus users. Credit: Lifehacker

"OpenAI's new o3 model represents a significant leap forward in AI's ability to adapt to novel tasks," writes François Chollet, the software engineer who designed the ARC test. "This is not merely incremental improvement, but a genuine breakthrough, marking a qualitative shift in AI capabilities compared to the prior limitations of LLMs."

Predictably, OpenAI didn't talk about the energy demands of AI, the ethics of training AI on publicly available data that may be copyrighted, or the tendency for these models to hallucinate wrong answers—while mistakes should be fewer because of o3's extra thinking time, they won't be eradicated. What the company did mention is an expansion of its safety testing program, designed to prevent these models from being used for malicious purposes.

The ability for AI models to truly "think" or "reason"—or at least attempt some approximation of those human capabilities—will no doubt continue to be discussed as AI development progresses. Google has also just unveiled its Gemini 2.0 model, which brings with it improved reasoning.

В Сети появились кадры с моментом падения машины в Москву-реку

Что случилось к этому часу: главные новости дня к 12:00 4 января

Пройти мастер-класс или выпить кофе можно в Доме культуры «ГЭС-2»

Жители Москвы смогут провести активные выходные в ТиНАО

AI Певица. Создание AI Певицы. AI Певец. AI Артист. Создание и продвижение AI Певицы.

Лесная охрана контролирует ситуацию в лесах в праздничные дни

"5 января начнется то, чего не было с 1829 года": Вильфанд поделился пугающим прогнозом

В Подольске порядка 60 юных хоккеистов с нарушениями развития борются за победу в «Турнире героев»

«Зима в Москве»: в ходе новогодней акции на ярмарках собрали более тонны фруктов

Читайте на 123ru.net

Документальные новости

Досуг

Game24.pro

Фоторепортажи

Частные объявления в Вашем городе, в Вашем регионе и в России

Новости от наших партнёров в Вашем городе

В Сети появились кадры с моментом падения машины в Москву-реку

Что случилось к этому часу: главные новости дня к 12:00 4 января

Пройти мастер-класс или выпить кофе можно в Доме культуры «ГЭС-2»

Жители Москвы смогут провести активные выходные в ТиНАО

AI Певица. Создание AI Певицы. AI Певец. AI Артист. Создание и продвижение AI Певицы.

Латвия проведет собственное расследование смерти в Москве бывшего мужа певицы Седоковой баскетболистпа Яниса Тиммы

Sun: экс-президента Сирии Асада пытались отравить в Москве

Кабинет Артиста в Яндекс. Кабинет Артиста в Яндекс Музыке.

В Подольске порядка 60 юных хоккеистов с нарушениями развития борются за победу в «Турнире героев»

Главный педиатр Подмосковья напомнила о правилах катания с горок

Лесная охрана контролирует ситуацию в лесах в праздничные дни

Московский зоопарк сообщил о появлении еще двух малайских медвежат

Магнетрон, матричный слой и фотоэлектрические свойства…

Carolina Herrera pre-fall 2025

Исследование дежавю и жамевю: что происходит с нашей памятью?

Селлеры России снижают цены на Apple-технику благодаря параллельному импорту

Хачанов и Рублев пробились в финал турнира ATP в Гонконге в парном разряде

«Легче подняться на какой‑то уровень, чем там удержаться». Ольховский — о прогрессе Андреевой и Шнайдер в 2025 году

Наоми Осака впервые с 2022 года вышла в финал турнира WTA

Брисбен (ATP). 1/4 финала. Джокович сыграет с Опелкой. Димитров – с Томпсоном

Осенняя. Озеро Ворожеич, Учалинский район, Башкортостан.

Apple отложила использование 2-нм чипов для iPhone 17 Pro

Ученый назвал 16 см лучшим размером пениса для удовлетворения женщин

Щелчок "по носу" Пашиняна? После скандала с Лукашенко провести задержание не удалось

Топ новостей на этот час

Тамбовский спортсмен выступит на первенстве России по бильярду

Небензя: новые власти Сирии посылают РФ сигналы о заинтересованности

"5 января начнется то, чего не было с 1829 года": Вильфанд поделился пугающим прогнозом

Опубликованы подробности о водителе Volkswagen в Москве, который рухнул в реку