Artificial intelligence becomes more powerful but makes more mistakes - research

“

Dmitry Dzhugalik Author of news on Mezha.media. I write about what I actively admire, namely technology, games and cinema.

The latest major language models of AI with the function of reasoning – in particular O3, the most powerful model Openai – make more errors than their predecessors. About it writes The New York Times with reference to several studies.

Similar problems are found in other companies, such as Google or Chinese DEEPEEEK. Despite the fact that their mathematical opportunities are growing significantly, the actual errors in other queries are only greater.

One of the most common problems of artificial intelligence is the so -called “hallucinations” when models simply recover information and facts and do not support them with any sources. Despite all the efforts of the developers, the AMR Avadalla, the General Director of the Startup Vectara, who develops Shi-tools for business, says that hallucinations will always be present.

An example of such hallucination was the Cursor tool. He falsely reported that the tool can only be used on one computer. This caused a wave of complaints and even deleting accounts by users. Subsequently, it turned out that the company did not make any such changes – all this bot invented.

During a separate testing of different models, the level of hallucinations – that is, fictional facts – reached 79%. In internal OpenAi testing, the O3 model allowed 33% of hallucinations in answers to questions about famous persons, which is twice as much compared to O1. Even worse results were shown by a new 04-mini model, which was mistaken in 48% of cases.

When answers to general questions, the level of hallucinations in O3 and O4-mini models was even higher-51% and 79%, respectively. For comparison, the older O1 model invented facts in 44% of cases. Openai recognizes that additional research is needed to find out the causes of such errors.

Independent tests conducted by companies and researchers indicate that hallucinations also occur in models with Google and DEEPEEEK. Vectara, in particular, found out in its own study that such models invent facts in at least 3% of cases, and sometimes this figure reaches 27%. Despite the efforts of companies to eliminate these errors, the level of hallucinations has decreased by only 1-2%over the last year.

”, – WRITE: mezha.media

Dmitry Dzhugalik Author of news on Mezha.media. I write about what I actively admire, namely technology, games and cinema.

The latest major language models of AI with the function of reasoning – in particular O3, the most powerful model Openai – make more errors than their predecessors. About it writes The New York Times with reference to several studies.

Similar problems are found in other companies, such as Google or Chinese DEEPEEEK. Despite the fact that their mathematical opportunities are growing significantly, the actual errors in other queries are only greater.

One of the most common problems of artificial intelligence is the so -called “hallucinations” when models simply recover information and facts and do not support them with any sources. Despite all the efforts of the developers, the AMR Avadalla, the General Director of the Startup Vectara, who develops Shi-tools for business, says that hallucinations will always be present.

An example of such hallucination was the Cursor tool. He falsely reported that the tool can only be used on one computer. This caused a wave of complaints and even deleting accounts by users. Subsequently, it turned out that the company did not make any such changes – all this bot invented.

During a separate testing of different models, the level of hallucinations – that is, fictional facts – reached 79%. In internal OpenAi testing, the O3 model allowed 33% of hallucinations in answers to questions about famous persons, which is twice as much compared to O1. Even worse results were shown by a new 04-mini model, which was mistaken in 48% of cases.

When answers to general questions, the level of hallucinations in O3 and O4-mini models was even higher-51% and 79%, respectively. For comparison, the older O1 model invented facts in 44% of cases. Openai recognizes that additional research is required to find out the causes of such errors.

Independent tests conducted by companies and researchers indicate that hallucinations also occur in models with Google and DEEPEEEK. Vectara, in particular, found out in its own study that such models invent facts in at least 3% of cases, and sometimes this figure reaches 27%. Despite the efforts of companies to eliminate these errors, the level of hallucinations has decreased by only 1-2%over the last year.

Name	Price	24H (%)
Bitcoin(BTC)	$99,639.00	2.82%
Ethereum(ETH)	$1,933.99	5.26%
Tether(USDT)	$1.00	0.01%
XRP(XRP)	$2.20	2.86%
BNB(BNB)	$615.59	1.56%
Solana(SOL)	$152.31	3.52%
Dogecoin(DOGE)	$0.182390	5.39%
TRON(TRX)	$0.250773	2.53%
Toncoin(TON)	$3.12	3.56%
Official Trump(TRUMP)	$12.13	10.71%
POL (ex-MATIC)(POL)	$0.228058	3.41%

Login

Register

Related posts

Leave a Comment Cancel Reply