Tech and Telecom

Is OpenAI’s o3 AI Model Weaker Than They Claimed? Independent Tests Reveal Shocking Gap

OpenAI’s newest AI model, o3, is at the center of a growing controversy after third-party tests revealed performance significantly lower than the company’s earlier claims.

Originally hailed as a major step forward in reasoning tasks, o3 was said to solve over 25% of the challenging FrontierMath benchmark, a claim made during OpenAI’s December livestream presentation. But new results tell a different story.

ALSO READ

OpenAI’s Latest AI Models Are Making Up False Information More Than Ever

Third-Party Tests

Independent research institute Epoch AI tested the newly released o3 model and found that it scored around 10% on the same benchmark, far below OpenAI’s internal figure. Epoch’s results sparked immediate debate over the model’s real-world capabilities and the transparency of OpenAI’s testing practices.

Ad Powered By Advergic
  Loading ad . . . 
 Ad - Continue scrolling to read

Epoch noted that the discrepancy may stem from differences in testing setup, dataset versions, or the use of “aggressive test-time compute” by OpenAI. In other words, OpenAI’s high-performing o3 might not be the same version currently available to the public.

Smaller, Faster, But at What Cost?

Further complicating the situation, a post by the ARC Prize Foundation revealed that the public o3 model is smaller and optimized for cost and speed, not peak performance. Even OpenAI’s technical staff confirmed the production version of o3 is tuned for real-world responsiveness, not benchmarking supremacy.

ALSO READ

Google’s One AI Premium is Free for Students Right Now and Some Pakistani Students Can Use It

Industry-Wide Pattern?

This isn’t the first time benchmark reporting has stirred backlash. Elon Musk’s xAI and Meta have both been accused of promoting scores from unreleased or altered versions of their models. OpenAI, too, previously faced criticism when Epoch delayed disclosing OpenAI’s funding behind FrontierMath.

Bigger Models Are Coming

Despite the confusion, OpenAI plans to release o3-pro, a more powerful version, in the coming weeks. Additionally, smaller models like o3-mini-high and o4-mini already outperform o3 on some benchmarks.

Stay Connected with ProPakistani

Get the latest tech news, telecom insights, and product launches wherever you prefer.

Add as a preferredSource on Google Follow on Google News Join WhatsApp

Add ProPakistani to Preferred Sources and see more of our stories in Google Search and Top Stories.

Share

Published by

Afaq Wajdan Malik

April 21, 2025 6:46 pm

Recent Posts

Pakistan

Lahore Private Schools Get Three-Day Deadline

July 14, 2026

Pakistan

IESCO Issues Electricity Suspension Schedule for Wednesday

July 14, 2026

Pakistan

Five-Day Electricity Shutdown Schedule Announced for Gujranwala

July 14, 2026

Auto

New Timings Announced for BRT Peshawar

July 14, 2026

Pakistan

Punjab to Install Panic Buttons at Chinese Residences and Worksites

July 14, 2026

Pakistan

PMDC Approves Increase in MBBS Seats for Punjab

July 14, 2026