OpenAI Researchers Contradict CEO's Claims of AI's Skill

OpenAI researchers have acknowledged that even the most advanced AI models are still unable to outperform human coders. This directly contradicts CEO Sam Altman’s claim that AI could replace “low-level” software engineers by the end of this year.

AI Benchmarking for Coding Tasks

In a newly published study, OpenAI found that its cutting-edge AI models struggle with complex coding tasks and remain unreliable for real-world software development.

The researchers used a newly developed benchmark called SWE-Lancer, which they built on over 1,400 software engineering tasks sourced from the freelancer platform Upwork. The study evaluated OpenAI’s o1 reasoning model and GPT-4o, alongside competitor Anthropic’s Claude 3.5 Sonnet, to assess their ability to handle coding challenges.

AI Struggles With Software Engineering

Despite being given tasks valued at hundreds of thousands of dollars, the AI models consistently failed to detect the root causes of bugs in larger projects. Instead, they could only address surface-level issues without fully grasping the broader software structure.

Although the AI models demonstrated the ability to work significantly faster than human coders, their solutions were often incorrect or incomplete. Claude 3.5 Sonnet performed better than OpenAI’s offerings among the tested models, yet its responses remained largely inaccurate. The study concluded that these models lack the reliability for real-world coding applications.

AI is Not Ready to Replace Human Engineers

The findings reaffirm that while AI models have made remarkable progress, they are still far from being a viable replacement for human coders. Their inability to manage complex software tasks shows that more like assistants rather than full-blown human engineers.

Stay Connected with ProPakistani

Get the latest tech news, telecom insights, and product launches wherever you prefer.

Add as a preferredSource on Google Follow on Google News Join WhatsApp

Add ProPakistani to Preferred Sources and see more of our stories in Google Search and Top Stories.

AI Still Lags Behind Human Workers in This Key Skill: OpenAI Researchers

AI Benchmarking for Coding Tasks

AI Struggles With Software Engineering

AI is Not Ready to Replace Human Engineers

Stay Connected with ProPakistani

Afaq Wajdan Malik

Promoted Stories

Bank Alfalah Inaugurates its 1,200th Branch at Khayaban-e-Shamsheer, Karachi, Reinfo…

Latest News

Karachi Mayor Renames Regal Chowk

Petrol and Diesel Prices Increased in Pakistan Again

Zain Ibrahim on the Future of Health Insurance in Pakistan

IESCO Announces Power Suspension Schedule for Multiple Areas on Wedne…

KP to Buy Two Rescue Helicopters

Now Trending

NADRA's Pak ID App Gets 2 New Features

4 Pakistani Bikers Reach Arctic Circle on Pakistan-Registered Motorcycles

Punjab Launches Pink Salt Value Addition Financing Scheme with Loans Up to Rs. 5 Cro…

Rs. 1 Lac in National Savings Now Earns Less Than Rs. 1,000 per Month

Pakistan’s Solar Boom Reaches 7,000 MW as Govt Moves to Replace Net Metering

ProPakistani Community

AI Benchmarking for Coding Tasks

AI Struggles With Software Engineering

AI is Not Ready to Replace Human Engineers

Stay Connected with ProPakistani

Afaq Wajdan Malik

0

Promoted Stories

Bank Alfalah Inaugurates its 1,200th Branch at Khayaban-e-Shamsheer, Karachi, Reinfo…

Latest News

Karachi Mayor Renames Regal Chowk

Petrol and Diesel Prices Increased in Pakistan Again

Zain Ibrahim on the Future of Health Insurance in Pakistan

IESCO Announces Power Suspension Schedule for Multiple Areas on Wedne…

KP to Buy Two Rescue Helicopters

Now Trending

NADRA's Pak ID App Gets 2 New Features

4 Pakistani Bikers Reach Arctic Circle on Pakistan-Registered Motorcycles

Punjab Launches Pink Salt Value Addition Financing Scheme with Loans Up to Rs. 5 Cro…

Rs. 1 Lac in National Savings Now Earns Less Than Rs. 1,000 per Month

Pakistan’s Solar Boom Reaches 7,000 MW as Govt Moves to Replace Net Metering

Follow Us

ProPakistani Community