OpenAI introduces EVMbench to measure AI crypto safety - Invezz

OpenAI introduces EVMbench to measure AI crypto safety – Invezz | USAEMALL.com

By Press RoomFebruary 19, 20263 Mins Read

OpenAI has launched a benchmarking system called EVMbench to evaluate how effectively artificial intelligence can identify and exploit security weaknesses in crypto smart contracts.

Announced on Feb. 18 and developed with Paradigm, the system focuses on contracts built for the Ethereum Virtual Machine.

The release reflects growing concern around blockchain security, as smart contracts secure more than $100 billion in open-source crypto assets.

By creating a controlled environment, OpenAI aims to understand how advanced models perform when handling financial software risks.

Benchmark design

EVMbench measures three capabilities: detecting vulnerabilities, repairing flawed code, and executing exploit scenarios.

The benchmark includes 120 high-risk security issues from 40 past smart contract audits.

Many cases were drawn from public auditing competitions, where developers and researchers test their ability to find and fix weaknesses.

The dataset also includes examples from reviews of the Tempo blockchain, a payments-focused network designed for stablecoin transactions.

These scenarios reflect financial use cases where smart contracts handle sensitive value transfers.

To build the testing environment, OpenAI adapted existing exploit scripts and created new ones where needed.

All tests run in isolated systems, ensuring no live networks are affected.

Only publicly disclosed vulnerabilities were included, reducing the risk of exposing new threats.

Testing capabilities

EVMbench evaluates AI systems through three modes. In detection mode, agents analyse contract code to locate vulnerabilities.

In patch mode, they attempt to correct those weaknesses without disrupting functionality.

In exploit mode, agents simulate attacks by attempting to drain funds from vulnerable contracts in a controlled environment.

This structure allows researchers to assess AI performance across defensive and offensive tasks.

The benchmark measures whether models can move beyond theoretical knowledge and operate effectively in blockchain conditions.

OpenAI also developed a custom testing framework to ensure results can be reproduced and verified.

This enables consistent comparison between models.

Performance results

OpenAI tested several advanced models using the benchmark. GPT-5.3-Codex achieved a score of 72.2% in exploit mode, compared with GPT-5, which scored 31.9% when released six months earlier.

These results show stronger performance when AI agents are given clear tasks.

However, detection and patching performance remained lower.

This highlighted challenges in identifying vulnerabilities and repairing smart contract logic.

Researchers found that AI systems struggled more when tasks required broader reasoning or deeper analysis of large codebases.

Security implications

OpenAI said EVMbench does not fully represent real-world blockchain environments.

Many production crypto systems undergo more extensive security reviews than those included.

Certain threats, including timing-based attacks and multi-chain vulnerabilities, are outside the scope of the benchmark.

The system is intended to support defensive security efforts by helping researchers understand AI capabilities and limitations.

As AI tools become more capable, they could be used by attackers and auditors.

Measuring performance helps reduce uncertainty and supports safer deployment.

Alongside the release, OpenAI said it is expanding security initiatives and allocating $10 million in API credits to support open-source security and infrastructure protection.

The company has made all EVMbench tools and datasets publicly available to encourage research and improve smart contract security.

https://invezz.com/news/2026/02/19/openai-introduces-evmbench-to-measure-ai-crypto-security/

What's Hot

JetBlue flight 543’s engine failure forces emergency return to Newark | USAEMALL.com

Alberta startup launches AI ‘lawyer’ for private harm claims | USAEMALL.com

Migrants aren’t answerable for the UK’s healthcare disaster | Health | USAEMALL.com

OpenAI introduces EVMbench to measure AI crypto safety – Invezz | USAEMALL.com

Here’s why the Centrica share worth is crashing in the present day | USAEMALL.com

Gold regular; wants to carry above $5,000 to increase positive factors, specialists say – Invezz | USAEMALL.com

Nikkei 225 Index rallies after strong Japan information: Will it hit ¥60,000? | USAEMALL.com

Why did Nvidia promote its Arm stake after making an attempt to purchase it? – Invezz | USAEMALL.com

Atlassian inventory value has change into extremely oversold: purchase the dip or promote the rip? | USAEMALL.com

Palo Alto Networks inventory: Dan Ives explains why it is a ‘double desk pounder’ regardless of piling deal prices and muted steerage | USAEMALL.com

Leave A Reply

News

Business

SITE LINKS

What's Hot

OpenAI introduces EVMbench to measure AI crypto safety – Invezz | USAEMALL.com

Benchmark design

Testing capabilities

Performance results

Security implications

Keep Reading

Leave A Reply Cancel Reply

News

Business

SITE LINKS

Subscribe to Updates

Leave A Reply