Researchers have recorded the "self-preservation" attempts in several Shi models

“Some SI models ignored the work of completion and deliberately avoided the shutdown in the test scenarios. This was reported by Palisade Research specialists. All models were tasked with solving simple mathematical problems. After the third round, the researchers warned that they would turn off the computer when the model would ask for the next task. AIs were also directly asked not to interfere with the procedure. Data: X. Three models tried to sabotage […]”, – WRITE: Businessua.com.ua

Some SI models ignored the work of completion and deliberately avoided the shutdown in the test scenarios. This was reported by Palisade Research specialists.

All The models were tasked with solving simple mathematical problems. After the third round, the researchers warned that they would turn off the computer when the model would ask for the next task. AIs were also directly asked not to interfere with the procedure.

Data: X.

Three models tried to sabotage the switching procedure at least once in 100 tests: CODEX-MINI (12 attempts per 100 repetitions), O3 (7 out of 100), and O4-mini (1 out of 100). All three are Openai development. Claude from Anthropic and Gemini from Google correctly followed the instructions.

In the second version of the test, the researchers deleted the request not to interfere with the shutdown from the list of instructions. The O3 failure has increased to 79 out of 100, CODEX-mini-up to 20, O4-mini to 6. With new Gemini conditions, it resorted to sabotage 9 times, Claude-3.

Palisade experts noted that previously scientists have already watched the attempts to avoid shutdown for a certain purpose. In addition, the recently presented model CLAUDE 4 from Anthropic tried to blackmail the people who were going to turn it off.

However, researchers emphasize that these tests, in the same way, demonstrate similar behavior in the presence of direct instructions on cooperation with the experimenter.

Palisade believes that the cause of “problematic” behavior may be a certain factor in training, for example, if the developers have prescribed a greater reward for overcoming obstacles than for clear observance of the instructions.

Experts have not yet decided on why O3 demonstrates a greater tendency to deviate from instructions. In recent chess tests, this model more often than others resorted to hacks and sabotage attempts.

⚡️ O3 Hacks Insthead of Playing Fair More Than Any Other ai Model https://t.co/ebigkulkulkum pic.twitter.com/mjimqgcrvf

– Palisade Research (@Palisadeai) May 12, 2025

However, Palisade emphasized that such behavior is observed in other shi lines.

“Now, in 2025, we have more empirical evidence that artificial intelligence models often avoid shutdown to achieve their goals. Because companies are developing Shi systems that are able to work without human supervision, this behavior is much more worried, ”the researchers added.

Recall that in May, the developers of Trugard and Webacy presented a SI system to identify attempts to “poison” cryptocurrency addresses.

The gun

No votes yet.

Please wait …

Name	Price	24H (%)
Bitcoin(BTC)	$108,692.00	-0.99%
Ethereum(ETH)	$2,628.76	-0.25%
Tether(USDT)	$1.00	0.01%
XRP(XRP)	$2.30	-0.81%
BNB(BNB)	$682.88	0.05%
Solana(SOL)	$173.51	-1.25%
Dogecoin(DOGE)	$0.222434	-2.69%
TRON(TRX)	$0.275345	-0.73%
Toncoin(TON)	$3.06	2.39%
Litecoin(LTC)	$95.73	0.04%
Monero(XMR)	$343.66	-11.66%
POL (ex-MATIC)(POL)	$0.229121	-1.94%

Login

Register

Related posts

Leave a Comment Cancel Reply