May 8, 2025
Artificial intelligence becomes more powerful but makes more mistakes - research thumbnail
Ukraine News Today

Artificial intelligence becomes more powerful but makes more mistakes – research

Dmitry Dzhugalik Author of news on Mezha.media. I write about what I actively admire, namely technology, games and cinema.

The latest major language models of AI with the function of reasoning – in particular O3, the most powerful model Openai – make more errors than their predecessors. About it writes The New York Times with reference to several studies.

Similar problems are found in other companies, such as Google or Chinese DEEPEEEK. Despite the fact that their mathematical opportunities are growing significantly, the actual errors in other queries are only greater.

One of the most common problems of artificial intelligence is the so -called “hallucinations” when models simply recover information and facts and do not support them with any sources. Despite all the efforts of the developers, the AMR Avadalla, the General Director of the Startup Vectara, who develops Shi-tools for business, says that hallucinations will always be present.

An example of such hallucination was the Cursor tool. He falsely reported that the tool can only be used on one computer. This caused a wave of complaints and even deleting accounts by users. Subsequently, it turned out that the company did not make any such changes – all this bot invented.

During a separate testing of different models, the level of hallucinations – that is, fictional facts – reached 79%. In internal OpenAi testing, the O3 model allowed 33% of hallucinations in answers to questions about famous persons, which is twice as much compared to O1. Even worse results were shown by a new 04-mini model, which was mistaken in 48% of cases.

When answers to general questions, the level of hallucinations in O3 and O4-mini models was even higher-51% and 79%, respectively. For comparison, the older O1 model invented facts in 44% of cases. Openai recognizes that additional research is needed to find out the causes of such errors.

Independent tests conducted by companies and researchers indicate that hallucinations also occur in models with Google and DEEPEEEK. Vectara, in particular, found out in its own study that such models invent facts in at least 3% of cases, and sometimes this figure reaches 27%. Despite the efforts of companies to eliminate these errors, the level of hallucinations has decreased by only 1-2%over the last year.

”, – WRITE: mezha.media

Dmitry Dzhugalik Author of news on Mezha.media. I write about what I actively admire, namely technology, games and cinema.

The latest major language models of AI with the function of reasoning – in particular O3, the most powerful model Openai – make more errors than their predecessors. About it writes The New York Times with reference to several studies.

Similar problems are found in other companies, such as Google or Chinese DEEPEEEK. Despite the fact that their mathematical opportunities are growing significantly, the actual errors in other queries are only greater.

One of the most common problems of artificial intelligence is the so -called “hallucinations” when models simply recover information and facts and do not support them with any sources. Despite all the efforts of the developers, the AMR Avadalla, the General Director of the Startup Vectara, who develops Shi-tools for business, says that hallucinations will always be present.

An example of such hallucination was the Cursor tool. He falsely reported that the tool can only be used on one computer. This caused a wave of complaints and even deleting accounts by users. Subsequently, it turned out that the company did not make any such changes – all this bot invented.

During a separate testing of different models, the level of hallucinations – that is, fictional facts – reached 79%. In internal OpenAi testing, the O3 model allowed 33% of hallucinations in answers to questions about famous persons, which is twice as much compared to O1. Even worse results were shown by a new 04-mini model, which was mistaken in 48% of cases.

When answers to general questions, the level of hallucinations in O3 and O4-mini models was even higher-51% and 79%, respectively. For comparison, the older O1 model invented facts in 44% of cases. Openai recognizes that additional research is required to find out the causes of such errors.

Independent tests conducted by companies and researchers indicate that hallucinations also occur in models with Google and DEEPEEEK. Vectara, in particular, found out in its own study that such models invent facts in at least 3% of cases, and sometimes this figure reaches 27%. Despite the efforts of companies to eliminate these errors, the level of hallucinations has decreased by only 1-2%over the last year.

Related posts

The European Commission is preparing transitional measures for Ukraine upon completion of “trade visa”

radiosvoboda

Putin does not want peace, you need to determine the framework of its restraint – Volker

business ua

Prior to use in the Armed Forces allowed a tracked robot “Donkey”

radiosvoboda

Leave a Comment

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More