90s video games proved too difficult for AI

“Available multimodal models are still unable to perform tasks that require interactive planning and orientation in a dynamic environment. This conclusion was reached by researchers at the University of Printon at VideoGamebench. Gemini 2.5 Pro plays in Kirby’s Dream Land in real time. Data: Videogamebench. Scientists have checked the GEMINI 2.5 PRO, GPT-4O, LLAMA 4, GEMINI 2.0 FLASH and CLAUDE […]”, – WRITE: Businessua.com.ua

90s video games were too complicated for AI - Infbusiness

Available multimodal models are still unable to perform tasks that require interactive planning and orientation in a dynamic environment. Such a conclusion came Researchers at Princeton University in Videogamebench.

Gemini 2.5 Pro plays in Kirby’s Dream Land in real time. Data: Videogamebench.

Scientists have checked the GEMINI 2.5 PRO, GPT-4O, LLAMA 4, Gemini 2.0 Flash and Claude 3.7 Sonnet in 10 popular 2Ds of the late 90’s-from Super Mario to Age of Empires. Conditions: Access only to the game video and a brief description of management and goals.

Videogamebench test scheme. Data: arxiv.org.

The best real -time result is only 0.48% of success shown by Gemini 2.5 Pro. In the simplified Lite mode, where the game stops before each action, the result is slightly higher – 1.6%.

Videogamebench test split performance, consisting of 10 games. Each score is displayed as a percentage of the game on the basis of the passed checkpoints, ie 0% means that the agent has not reached the first checkpoint. The total score is calculated as arithmetic mean in all games. Data: arxiv.org.

Unlike text tasks, games require not only image recognition, but also fast solutions, spatial memory, long -term planning and adaptation to changing conditions. Delays in the inference even in the most modern VLM models do not allow them to act in real time, especially in arcade or strategic Tella.

“Models cannot understand a simple instruction such as” Turn on the Mill “, even having tips on the screen,” the authors of the study said.

According to them, even the basic logic of the game world (such as what water needed for food) has been too complicated for modern VLM.

You can read the code and examples of passing on the official site Videogamebench and GitHub.

We will remind, specialists of Palisade Research recorded attempts “Self-preservation” in several Shi models.

The gun

No votes yet.

Please wait …

Name	Price	24H (%)
Bitcoin(BTC)	$105,716.00	-2.06%
Ethereum(ETH)	$2,628.06	-3.66%
Tether(USDT)	$1.00	0.00%
XRP(XRP)	$2.19	-3.96%
BNB(BNB)	$669.86	-2.06%
Solana(SOL)	$163.47	-5.31%
Dogecoin(DOGE)	$0.205199	-8.61%
TRON(TRX)	$0.269066	-2.11%
Toncoin(TON)	$3.28	-0.76%
Litecoin(LTC)	$90.74	-6.47%
Monero(XMR)	$338.42	-2.66%
POL (ex-MATIC)(POL)	$0.219252	-4.68%

Login

Register

Related posts

Leave a Comment Cancel Reply